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Abstract 

The article contains an overview over locally stationary processes. At the beginning 
time varying autoregressive processes are discussed in detail - botli as as a deep example 
and an important class of locally stationary processes. In the next section a general 
framework for time series with time varying finite dimensional parameters is discussed 
with special emphasis on nonlinear locally stationary processes. Then the paper focusses 
on linear processes where a more general theory is possible. First a general definition 
for linear processes is given and time varying spectral densities are discussed in detail. 
Then the Gaussian likelihood theory is presented for locally stationary processes. In 
the next section the relevance of empirical spectral processes for locally stationary time 
series is discussed. Empirical spectral processes play a major role in proving theoretical 
results and provide a deeper understanding of many techniques. The article concludes 
with an overview of other results for locally stationary processes. 
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1 Introduction 



Stationarity has played a major role in time series analysis for several decades. For station- 
ary processes there exist a large variety of models and powerful methods, such as bootstrap 
methods or methods based on the spectral density. Furthermore, there are important math- 
ematical tools such as the ergodic theorem or several central limit theorems. As an example 
we mention the likelihood theory for Gaussian processes which is well developed. 

During recent years the focus has turned to nonstationary time series. Here the situation is 
more difficult: First, there exists no natural generalization from stationary to nonstationary 
time series and second, it is often not clear how to set down a meaningful asymptotics for 
nonstationary processes. An exception are nonstationary models which are generated by 
a time invariant generation mechanism - for examples integrated or cointegrated models. 
These models have attracted a lot of attention during recent years. For general nonsta- 
tionary processes ordinary asymptotic considerations are often contradictory to the idea of 
nonstationarity since future observations of a nonstationary process may not contain any 
information at all on the probabilistic structure of the process at present. For this reason 
the theory of locally stationary processes is based on infill asymptotics originating from 
nonparametric statistics. 

As a consequence valuable asymptotic concepts such as consistency, asymptotic normality, 
efficiency, LAN-expansions, neglecting higher order terms in Taylor expansions, etc. can 
be used in the theoretical treatment of statistical procedures for such processes. This leads 
to several meaningful results also for the original non-rescaled case such as the comparison 
of different estimates, the approximations for the distribution of estimates and bandwidth 
selection (for a detailed example see Remark 2.3). 

The type of processes which can be described with this infill asymptotics are processes 
which locally at each time point are close to a stationary process but whose characteristics 
(covariances, parameters, etc.) are gradually changing in an unspecific way as time evolves. 
The simplest example for such a process may be an AR(p)-process whose parameters are 
varying in time. The infill asymptotic approach means that time is rescaled to the unit 
interval. For time varying AR-processes this is explained in detail in the next section. 
Another example are GARCH-processes which have recently been investigated by several 
authors - see Section |3l 

The idea of having locally approximately a stationary process was also the starting point 
of Priestley's theory of processes with evolutionary spectra (Priestley (1965) - see also 
Priestley (1988), Granger and Hatanaka (1964), Tj0stheim (1976) and Melard and Herteleer- 
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de-Schutter (1989) among others). Priestley considered processes having a time varying 
spectral representation 



/7r 
exp{iXt)AtiX)d^iX), teZ 
-n 



with an orthogonal increment process ^(A) and a time varying transfer function At{X). 
(Priestley mainly looked at continuous time processes, but the theory is the same). Also 
within this approach asymptotic considerations (e.g. for judging the efficiency of a local 
covariance estimator) are not possible or meaningless from an applied view. Using the 
above mentioned infill asymptotics means in this case basically to replace At{X) with some 



function A{t/T,X) - see (78). 



Beyond the above cited references on processes with evolutionary spectra there has also been 
work on processes with time varying parameters which does not use the infill asymptotics 
discussed in this paper (cf. Subba Rao (1970); Hallin (1986) among others). Furthermore, 
there have been several papers on inference for processes with time varying parameters - 
mainly within the engineering literature (cf. Grenier (1983), Kayhan et.al. (1994) among 
others). 

The paper is organized as follows: In Section [2] we start with time varying autoregressive 
processes as a deep example and an important class of locally stationary processes. There 
we mark many principles and problems addressed at later stages with higher generality. 
In Section |3] we present a more general framework for time series with time varying finite 
dimensional parameters and show how nonparametric inference can be done and theoretically 
handled. We also introduce derivative processes which play a major role in the derivations. 
The results cover in particular nonlinear processes such as GARCH-processes with time 
varying parameters. 

If one restrict to linear processes or even more to Gaussian processes then a much more 
general theory is possible which is developed in the subsequent sections. In Section|4]we give 
a general definition for linear processes and discuss time varying spectral densities in detail. 
Section [5] then contains the Gaussian likelihood theory for locally stationary processes. In 
Section [6] we discuss the relevance of empirical spectral processes for locally stationary time 
series. Empirical spectral processes play a major role in proving theoretical results and 
provide a deeper understanding of many techniques. 
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2 Time varying autoregressive processes a deep example 



We now discuss time varying autoregressive processes in detail. In particular we mark many 
principles and problems addressed at later stages with higher generality. Consider the time 
varying AR(1) process 

Xt + atXt-i = at et with et iid 7^(0, 1). (1) 

We now apply infill asymptotics that is we rescale the parameter curves at and at to the 
unit interval. This means that we replace them by a( j?) and a{^) with curves a{-) : [0, 1] — )• 
( — 1, 1) and (t(-) : [0, 1] — )• (0,oo) leading in the general AR(p)-case to the definition given 
in ([2]) below. Formally this results in replacing Xt by a triangular array of observations 
[Xt^T ; t = 1, . . . ,T; T G N) where T is the sample size. 

We now indicate again the reason for this rescaling. Suppose we fit the parameteric model 
cte,t ■= b + ct + dt^ to the nonrescaled model ([l]) which we assume to be observed for 
t = l,...,r. It is easy to construct different estimators for the parameters (e.g. the least 
squares estimator, the maximum likelihood estimator or a moment estimator) but it is nearly 
impossible to derive the finite sample properties of these estimators. On the other hand clas- 
sical non-rescaled asymptotic considerations for comparing these estimators make no sense 
since with t — )• 00 also ag t ~^ 00 while e.g. \at\ may be less than one within the observed 
segment - i.e. the resulting asymptotic results are without any relevance for the observed 
stretch of data. By rescaling at and at to the unit interval as described above we overcome 
these problems. As T tends to infinity more and more observations of each local structure 
become available and we obtain a reasonable framework for a meaningful asymptotic analy- 
sis of statistical procedures allowing to retain such powerful tools as consistency, asymptotic 
normality, efficiency, LAN-expansions, etc. for nonstationary processes. For example the 
results on asymptotic normality of an estimator obtained in this framework may be used to 
approximate the distribution of the estimator in the finite sample situation. It is important 
to note that classical asymptotics for stationary processes arises as a special case of this 
infill asymptotics in case where all parameter curves are constant. 

Unfortunately infill asymptotics does not describe the physical behavior of the process as 
T — 7- 00. This may be unusual for time series analysis but it has been common in other 
branches of statistics for many years. We remark that all statistical methods and procedures 
stay the same or can easily be translated from the rescaled processes to the original non- 
rescaled processes. A more complicated example on how the results of the rescaled case 



transfer to the non-rescaled case is given in Remark 2.3 
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Figure 1: T=128 realizations of a time varying AR(2)-model 

In the following we therefore consider time varying autoregressive (tvAR(p)) processes de- 
fined by 



t t 



(2) 



where the £t are independent random variables with mean zero and variance 1. We assume 
(t(u) = cr(0), aj{u) = aj(0) for n < and (t{u) = ct(1), aj{u) = Oij{\) for n > 1. In addition 
we usually assume some smoothness conditions on cj(-) and the aj{-). In addition one may 
include a time varying mean by replacing Xt-j^T in ^ by ^t-j,T — ^ see SectionjTjo. 

In some neighborhood of a fixed time point uq = to/n the process Xt^T can be approximated 
by the stationary process Xt{uo) defined by 



Xt{uo) + ^ aj{uo) Xt-j{uo) = cr{uo) £t, t G Z. 



(3) 



It can be shown (see Section |3]) that we have under suitable regularity conditions 



\Xt,T-Xt{uo)\=0^ 





t 






( 


T 


- Uq 





(4) 



which justifies the notation "locally stationary process". Xt^T has an unique time varying 
spectral density which is locally the same as the spectral density of Xt{u), namely 



f{u,\) :-- 



27r 



1 + ^aj(n)exp(-ijA) 



(5) 



(see Example 4.2). Furthermore it has locally in some sense the same autocovariance 

J —TT 



C[U, 



since cov(X[„'P],T) -'^[mT]-i-/c,t) = c{u, k) + 0(T -'^) uniformly in u and /c (cf.( 73)). This justifies 
to term c{u, k) the local covariance function of Xf^T at time u = t/T. 
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0.2 frequency 3.0 0.2 frequency 3.0 

Figure 2: True and estimated time varying spectrum of a tvAR(2)-process 

As an example Figure [T] shows T = 128 observations of a tvAR(2)-process with mean 
and parameters cr{u) = 1, ai{u) = — 1.8cos(1.5 — cos47ru), a2{u) = 0.81 and Gaussian 
innovations et- The parameters are chosen in a way such that for fixed u the complex roots 
of the characteristic polynomial are exp[ibi(1.5 — cos47ru)], that is they are close to the 
unit circle and their phase varies cyclically with u. As could be expected from these roots 
the observations show a periodic behavior with time varying period-length. The left picture 
of Figure [2] shows the true time varying spectrum of the process. One clearly sees that the 
location of the peak is also time varying (it is located at frequency 1.5— cos 47ra). 

1. Local estimation by stationary methods on segments 

An ad-hoc method which works in nearly all cases for locally stationary processes is to do 
inference via stationary methods on segments. The idea is that the process Xf^T is almost 
stationary on a reasonably small segment {t : \t/T — uq\ < 6/2}. The parameter of interest 
(or the correlation, spectral density, etc) is estimated by some classical method and the 
resulting estimate is assigned to the midpoint uq of the segment. By shifting the segment 
this finally leads to an estimate of the unknown parameter curve (time varying correlation, 
time varying spectral density, etc). An important modification of this method is obtained 
when more weight is put on data in the center of the interval than at the edges. This can 
often be achieved by using a data taper on the segment or by using a kernel type estimate. 

Since we use observations from the process Xt^T (instead of Xt{uQ)) the procedure causes a 
bias which depends on the degree of non-stationarity of the process on the segment. It is 
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possible to evaluate this bias and to use the resulting expression for an optimal choice of the 
segment length. To demonstrate this we now discuss the estimation of the AR coefficient 
functions by classical Yule- Walker estimates on segments. Since the approximating process 
Xtiuo) is stationary we obtain from ^ that the Yule- Walker equations hold locally at time 
uq, that is we have with Q.{uo) := (ai(iio), ap(tto))' 

a{uo) = -R{uoy^ r{uo) and a'^{uo) = c{uo,0) + cxiuo)' r{uo) (6) 

where r{uo) := {c{uo,l), ...,c{uo,p)y and R{uo) := {c{uQ,i - i)}ij=i,...,p. 

To estimate cx{uo) we use the classical Yule Walker estimator on the segment [uqT] — N/2 + 
1, . . . , [uoT]+N/2 (ordinary time) or on [uq — &t/2, no + &T/2] (rescaled time with bandwidth 
br := N/T), that is 

ariuo) = -Rriuoy^ rriuo) and a^{uo) = ct{uo,0) + axiuo)' rriuo) (7) 

where friuo) := {ct{uq,1), ...,ct{uq,p))' and Rt{uq) := {cr('Uo,^ - j)}i,i=i,.-,p ^it^ some 
covariance estimator CT{uo,j)- 

Before we discuss the properties of this estimator we first discuss different covariance esti- 
mates and their properties. 

2. Local covariance estimation 

The covariance estimate with data taper on the segment [uqT]— N/2 + 1, . . . , [uoT]-|-A^/2 is 

I ^ s t 

CT{uo,k):=— ^ /i(^)/i(^)^[„„r]-f+s,T^[«oT]-f+i,T- (8) 

where /i : [0, 1] — )• R is a data taper with h{x) = h{l — x), Hjsj := "^^S^ h'^ijj) ~ 
h?{x) dx is the normalizing factor. The data taper usually is largest at a; = 1/2 and 
decays slowly to at the edges. For h{x) = X(o,i](^) obtain the classical non-tapered 
covariance estimate. 

An asymptotically equivalent (and from a certain viewpoint more intuitive estimator) is the 
kernel density estimator 

~ ^ M 1 ^ f uo-{t + k/2)/T 
CT{uo,k) ■.= ^Y.^[ 

where X : R — )• [0, oo) is a kernel with K{x) = K{—x), J K{x)dx = 1, K{x) = for 
X [—1/2, 1/2] and bx is the bandwidth. Also equivalent is 

^T(no,., J) := ^ (^^^) X,^.tX,^.t (10) 



Xt,TXt+k,T 



(9) 
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with i—j = k which appears in least square regression - cf. Example 3.1 'i). If K{x) = h{x)'^ 
all three estimators are equivalent in the sense that they lead to the same asymptotic bias, 
variance and mean squared error. For reasons of clarity a few remarks are in order: 

1) The classical stationary method on a segment is in this case the estimator without data 
taper which is the same as the kernel estimator with a rectangular kernel. 

2) A first step towards a better estimate (as it is proved below) is to put higher weights in 
the middle and lower weights at the edges of the observation domain in order to cope in a 
better way with the nonstationarity of Xt^T on the segment. In this context this may be 
either achieved by using a kernel estimate or a data-taper which is asymptotically equivalent. 
This is straightforward for local covariance estimates and local Yule- Walker estimates and 
can usually also be applied to other estimation problems. 

3) Data-tapers have also been used for stationary time series (in particular in spectral 
estimation, but also with Yule Walker estimates and covariance estimation where they give 
positive definite autocovariances with a lower bias). Thus the reason for using data-tapers 
for segment estimates is twofold: reducing the bias due to nonstationarity on the segment 
and reducing the (classical) bias of the procedure as a stationary method. 

We now determine the mean squared error of the above estimators. Furthermore, we de- 
termine the optimal segment length and show that weighted estimates are better than 
ordinary estimates. 

Theorem 2.1 Suppose Xt^x is locally stationary with mean 0. Under suitable regularity 
conditions (in particular second order smoothness of c{-,k)) we have for ct{uo, k), ct{uo, k) 
and CT{uo,i, j) with K{x) = h{x)'^ and bx = N/T 



(z) 'EiCT{uQ,k) = c{uQ,k) + -b\ j x^K{x)dx 
and 

-1/2 



^2 



+ o{bl) + O 



1 



bxT 



1 r'^ / 1 \ 

A;)) = — j ^^^K{xfdx c{uo,£)[c{uo,i) + c{uo,£ + 2k)]+oi^—j 



Proof, (i) see Dahlhaus (1996c), (ii) is omitted (the form of the asymptotic variance is the 
same as in the stationary case). 

Note that the above bias of order 6^ is solely due to nonstationarity which is measured by 
■^^c{uq, k). If the process is stationary this second derivative is zero and the bias disappears. 
The bandwidth bx may now be chosen to minimize the mean squared error. 
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Remark 2.2 (Minimizing the mean squared error) 

Let ^(no) := c(no, k), r(no) := I]£_oo ^(^^0, ^) [c{uo, i)+c{uo, i+2k)] , dx := J x^K{x) dx 
and vk '■= f K(x)'^ dx. Then we have for the mean squared error 



6*^ 1 1 

J d\ n{uof + -^7fVK t{uo) + o(6^ + —] 



E \cT{uQ,k) - c(no,A;)| 
It can be shown (cf. Priestley, 1981, Chapter 7.5) that this MSE gets minimal for 

K{x) = Kopt{x) = 6x(l -x), < X < 1 

and 

b = boptiuo) = C{Koptfl'' 



t{uo) 



1/5 



2^-1/5 



1/2 



where C{K) = vx/dj^- In this case we have with c{K) = v^d^ 



T^/s E \ct{uo, k) - c(no, k)\^ = \ c{Koptf"' K^of^' r{uof"' + o(l) 



(11) 

(12) 
(13) 

(14) 



c{uo,k) measures the "degree of nonstationarity" while r(no) measures the 



variability of the estimate at time uq. The segment length Nopt = boptT gets larger if ^i{uq) 
gets smaller, i.e. if the process is closer to stationarity (in this case: if the /c-th order 
covariance is more constant / more linear in time) . At the same time the mean squared error 
decreases. The results are similar to kernel estimation in nonparametric regression. A yet 
unsolved problem is how to adaptively determine the bandwidth from the observed process. 

3. Segment selection and asymptotic mean squared-error for local Yule- Walker estimates 

For the local Yule- Walker estimates from ([T]) with the covariances ct{uo, k) as defined in ([8| 

so Example 3.7 

E Q;r(tio) = ol{uo) - ^ dx /^(lio) + o(^^ 



Dahlhaus and Giraitis (1998) have proved (see also Example 3.7) 

62 



with 



I 52 



and 



var(Q:T(no)) 

Thus, we obtain for E||Q;T(tio) — Q:(no)||^ the same expression as in |ll| with r(uo) = 
a'^{uo) ti{R{uo)~^} and /x(mo)^ replaced by ||/x(no)|p. With these changes the optimal band- 



width is given by (13) and the optimal mean squared error by (14). 
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Remark 2.3 (Implications for non-rescaled processes) Suppose that we observe data 
from a (non-rescaled) tvAR(p)-process 



Xt + '^ atj Xt-j = at et, t E Z. 



(15) 



In order to estimate at at some time we may use tlie segment Yule- Walker estimator as 
given in ([T]). The theoretically optimal segment length is given by (13) as 



Noptiuo) = C{Kopt)'^' 



1/5 



j.A/5 



(16) 



which at first sight depends on T and the rescaling. 

Suppose that we have parameter functions dj{-) and some T > to with aj(^) = aj(to) (i-e- 
the original function has been rescaled to the unit interval) and we denote by R, r and a. 
the corresponding parameters in the rescaled world (i.e. R{uq) = i?(to) etc.). Then 

t{uo) = a'^iuo) tr{Riuo)-'} = a\to) tr{R{to)-^} 

and (with the second order difference as an approximation of the second derivative) 

/x(no) = R{uo)~^ 



(—R{u))cx{uo)+(-^^ 



r[u, 



R{to 



Rjto) - 2R{to - 1) + Rjto - 2 

i/r2 



a{to) + 



rjto) - 2r{to - 1) + rjto - 2) 

i/r2 



Plugging this into (16) reveals that T drops out completely and the optimal segment length 



can completely be determined in terms of the original non-rescaled process. This is a nice 
example on how the asymptotic considerations in the rescaled world can be transferred with 
benefit to the original non-rescaled world. □ 

These considerations justify the asymptotic approach of this paper: While it is not possible 
to set down a meaningful asymptotic theory for the non-rescaled model ([T]) an approach 
using the rescaled model ^ leads to meaningful results also for the model ([T|. Another 
example for this relevance is the construction of confidence intervals for the local Yule- Walker 
estimates from the central limit theorem in Dahlhaus and Giraitis (1998), Theorem 3.2. 

4. Parametric Whittle-type estimates - a first approach 

We now assume that the p + 1-dimensional parameter curve 0{-) = (ai(-), . . . , Qp(-), (T^(-))' 
is parameterized by a finite dimensional parameter rj G that is 0{-) = Ofj{-). An example 
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studied below is where the AR-coefHcients are modeled by polynomials. Another example is 
where the AR-coefficients are modeled by a parametric transition curve as in Section [2]6(iv). 
In particular when the length of the time series is short this may be a proper choice. We 
now show how the stationary Whittle likelihood can be generalized to the locally stationary 



case (another generalization is given in (89)). 

If we were looking for a nonpar ametric estimate for the parameter curve 6{-) we could apply 
the stationary Whittle estimate on a segment leading to 

6^ (uq) := argminCj^ (uo,6) (17) 

with the Whittle likelihood 

/:5r(.o,0):=^£{log4vrV.(A) + ^^|^}dA (18) 



with the tapered periodogram on a segment about uq, that is 

N 



It{uo,X) ■ 



2ttHn 



X^^ilv) ^[«oT]-iV/2+s,T exp (- iXs) 



(19) 



Here h{-) is a data taper as in (jSJ). For h{x) = X{o,i](^) obtain the non-tapered peri- 
odogram. The properties of this nonparametric estimate are discussed later - in particular in 



Example 3.6 and at the end of Example 6.6 In case of a tvAR(p)-process 9t{uq) is exactly 
the local Yule- Walker estimate defined in ([7| with the covariance-estimate given in ([8| . 

Suppose now that we want to fit globally the parametric model 0{-) = 9:q{-) to the data, 
that is we have the time varying spectrum f^{u,X) := f0^(u){^)- Since Cjf{u,6) is an 
approximation of the Gaussian log-likelihood on the segment {[uT]— N/2+1, . . . , [uT]+N/2} 
a reasonable approach is to use 

rj^^ ■.= aigmmC^^ {7]) (20) 

with the block Whittle likelihood 

- E £ {>°^^'V,(«.. A) + ^} <<A. (21) 

Here uj := tj/T with tj := S{j — 1) -|- (j = 1, . . . , M) i.e. we calculate the likelihood 

on overlapping segments which we shift each time by 5. Furthermore T = S{M — 1) + N. 
A better justification of the form of the likelihood is provided by the asymptotic Kullback- 



Leibler information divergence derived in Theorem 4.4 
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0.929 
0.929 
0.916 



0.888 
0.901 
0.888 



0.669 
0.678 
0.694 



0.685 
0.694 

0.709 



0.673 
0.6«2 
0.697 



0.689 
0.698 
0.712 



2 



Table 1: Values for AIC for p — 2 and different polynomial orders 



As discussed above the reason for using data-tapers is twofold: they reduce the bias due 
to nonstationarity on the segment and they reduce the leakage (already known from the 
stationary case). It is remarkable that the taper in this case does not lead to an increase of 
the asymptotic variance if the segments are overlapping (cf. Dahlhaus (1997), Theorem 3.3). 

The properties of the above estimate are discussed in Dahlhaus (1997) including consistency, 
asymptotic normality, model selection and the behavior if the model is misspecified. The 
estimate is asymptotically efficient if S/N — )• 0. 

As an example we now fit a tvAR(p)-model to the data from Figure [T] and estimate the 
parameters by minimizing £^^(r/). The AR-coefficients are modeled as polynomials with 
different orders. Thus, we fit the model 



k=o 

to the data. The model orders p, Ki, . . . , Kp are chosen by minimizing the AlC-criterion 



Table 1 shows these values for p = 2 and different Ki and K2- The values for other p 
turned out to be larger. Thus, a model with p = 2, Ki = 6, K2 = is fitted. The function 
ai{u) and its estimate are plotted in Figurejsj For a2{u) we obtain 0.71 (a constant is fitted 
because of K2 = 0) while the true a2{u) is 0.81. Furthermore, o"^ = 1.71 while cr^ = 1.0. 
The corresponding (parametric) estimate of the spectrum is the right picture of Figure |2] 
and the difference to the true spectrum is plotted in Figure [4j 

Given the small sample size the quality of the fit is remarkable. Two negative effects can be 
observed. First, the fit of ai{u) becomes rather bad outside ui = 0.063 and um = 0.938. 
This is not surprising, due to the behavior of a polynomial and the fact that the use of 
^T^(^) ^ distance only punishes bad fits inside the interval [ni,njvf]. This end effect 
improves if one chooses Ki = 8 instead of A'l = 6. A better way seems to modify (rj) 




p 



AICip,Ki, ...,Kp) = loga\p,Ku . . . ,Kp) + 2{p + I + Y,Kj) / T. 
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Figure 3: True and estimated parameter curve Qi(-) 

and to include periodograms of shorter lengths at the edges. The second effect is that the 
peak in the spectrum is underestimated. This bias is in part due to the non-stationarity of 
the process on intervals (uj — N/{2T),Uj + N/{2T)) where lT{uj,X) is calculated. 

We mention that the above estimates can be written in closed form and calculated without 
an optimization routine. More generally this holds for tvAR(p)-models if cr^ is constant and 
'^i(^) — Yl^=i ^jk fk{u) with some functions /i(m), . . . , fxiu) (in the above case fk{u) = 
u^~^). For details see Dahlhaus (1997), Section 4. 

A closer look at the above estimate reveals that it is somehow the outcome of a two step 
procedure where in the first step the periodogram is calculated on segments (which implicitly 
includes some smoothing with bandwidth b = N/T) and afterwards the AR(p)-process with 
the above polynomials is fitted to the outcome (instead of a direct fit of the AR(p)-model 
and the polynomials to the data). We now make this more precise. 

With the above form of the spectrum fr]{u, A) (cf.([5|) and Kolmogorov's formula, (cf. Brock- 
well and Davis, 1991, Theorem 5.8.1) we obtain with RT{uj) and fxiuj) as defined in ([7| 
after some straightforward calculations 

11^ 1 

j=l 3' 
- - M 

(Rriuj) OLr^iuj) + fxiuj))' Rriuj)'^ {Rriuj) OLr^iuj) + fTiuj)) . 

We now plug in the Yule- Walker estimate axiu) = —Rt{u)^^ rxiu) with asymptotic vari- 
ance proportional to ct^(u) i?(u)~^ and a^{u) = ct{u, 0) — fT(ti)' Rt{u)^^ r-riu) with asymp- 



\ " 
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Figure 4: Difference of estimated and true spectrum 



totic variance 2a'^{u). Since logx 



1) - l{x - 1)2 + o((x - 1)2) we obtain 



rBW 



iv) 



M 

V 



1 1 



2M^2a4(^,) 



(Uj) 



+ 



1 1 
2M 



M 



M 



^ ^^log47r24(nj) + ^ + o 



2M 



cj2(nj) 



If the model is correctly specified then we have for rj close to the minimum: a'^{uj)~^ RT{uj) ~ 
(j'^{uj)~^ R{uj) and 2 cj^(uj) ~ 2 a'^iuj) which means that fjT is approximately obtained by a 
weighted least squares fit of Q;^(n) and (t'^{u) to the Yule- Walker estimates on the segments. 
The method works in this case since the (parametric!) model fitted in the second step is 
somehow 'smoother' than the first smoothing implicitly induced by using the periodogram 
on a segment. However, we would clearly run into problems if the fitted polynomials were 
of high order or if even Kj = Kj[T) — t- oo as T — )■ oo. 



A good alternative seems to use the quasi- likelihood C^^{r]) from (|89|) or (in particular for 



AR(p)-models) the conditional likelihood estimate from (30) with ^t,r(') as in (23) for which 
the estimator can explicitly be calculated if cr(-) = c. For cro(-) ^ c iterative or approximative 
solutions are needed. The properties of this estimator have not been investigated yet. In any 
case the benefit of the likelihood (rj) and even more of the improved likelihood Cj.^{ri) 
are their generality because they can be applied to arbitrary parametric models which can 
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be identified from tlie second order spectrum. 

Furthermore, algorithmic issues, such as in-order algorithms (e.g. generalizations of the 
Levinson-Durbin algorithm) need to be developed. 

5. Inference for nonpar ametric tvAR- models - an overview 

In the last section we studied parametric estimates for tvAR(p)-models. This is an important 
option if the length of the time series is short or if we have specific parametric models in 
mind. In general however one would prefer nonparametric models. For nonparametric 
statistics a large variety of different estimates are available (local polynomial fits, estimation 
under shape restrictions, wavelet methods etc) and it turns out that it is not too difficult 
to apply such methods to tvAR(p)-models and moreover also to other possibly nonlinear 
models (while the derivation of the corresponding theory may be very challenging). A key 
role is played by the conditional likelihood at time t which in the tvAR(p)-case is 

■■=-logf0{Xt^T\Xt-l,T,..-,Xi,T) (22) 
= - log(2vr a^) + ^ (x^.t + a, X^.j^t) (23) 



where 6 = (ai, . . . , a^, cr^)' and its approximation l^rpiO) defined in ( |96[ ). As a simple 
example consider the estimation of the curve ai(-) of a tvAR(l)-process by a local linear fit 
given by ai{-) = cq where 

T 2 

(co, ci) = argmin ^^k{ ^^ "fo^^^ ) ( + [^0 + ci - uo)] ^t-i,r ) (24) 

or more generally (with vectors Cq and Ci) given by 9{uq) = Cq with 

T 

(co,ci) = argmin £^^^(^co + ci{^ -uq)). (25) 

Besides this local linear estimate many other estimates can be constructed based on the 
conditional likelihood it.T{9) from above: 

1. A kernel estimate defined by 

T 



e{uo) = argmin ^ K^""^-^) lt,T{e) . (26) 

This estimate is studied in Section [3j We are convinced that it is equivalent to the 
local Yule- Walker estimate from ([t]) with K{x) = h{x)^, b = N/T and that all results 
from 3^ are exactly the same for this estimate. 
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2. A local polynomial fit defined by 0{uo) = cq with 

T 



(CO, ... , c,y = argmin ^ K'^'^X^) ( E " ^o)^) • (27) 



Local polynomial fits for tvAR(p)-models have been investigated by Kim (2001) and 
Jentsch (2006). 



3. An orthogonal series estimate (e.g. a wavelet estimate ) defined by 



= argmin ^ ^ it J Pj^i (|) ) 
^ t=i ^i=i ^ 



(28) 



_ ^ ^ J(T) ^ 

together with some shrinkage of /3 to obtain P and 0{uq) = Ylj=i f^ji^j i'^o)- Usually 

J(r) — )• oo as r — )• oo. Such an estimate has been investigated for a truncated wavelet 

expansion for tvAR(p)-models in Dahlhaus, Neumann and von Sachs (1999). 

4. A nonparametric maximum likelihood estimate defined by 

T 

0(.)= argmin i 5] V(^(;^)) (29) 



t=i 



where G is an adequate function space, for example a space of curves under shape 
restrictions such as monotonicity constraints. In Dahlhaus and Polonik (2006) the 
estimation of a monotonic variance function in a tvAR-model is studied, including 
explicit algorithms involving isotonic regression. 

5. A parametric fit for the curves 6{-) = 9n{-) with r/ € R'' defined by 

1 ^ t 
fi = argmin - V£t,T(0r,(;^)) (30) 

The resulting estimate has not been investigated yet. It is presumably very close to 
the exact MLE studied in Theorem 15.11 



Remark 2.4 (i) In the tvAR(p)-case the situation simplifies a lot if cr^(-) = c. In that case 
the estimates for a(-) and cr^ "split" and £t,Ti9) can in all cases be replaced by [Xf^T + 
Yl^=i Xt-j^T^"^ leading to least squares type estimates. 

(ii) All estimates from above can be transferred to other models by using the conditional 



Hkelihood (22) for the specific model. The kernel estimate will be investigated in Section pi 



15 



(iii) As mentioned above an alternative choice is to replace it,T{'9) by the local generalized 



Whittle likelihood l^rp(0) from (96). With that likelihood several estimates from above 
have been investigated - see the detailed discussion at the end of Section [5] In that case the 
d-dimensional parameter curve 6{-) = (^i(-), . . . , Od{'))' must be uniquely identifiable from 
the time varying spectrum f{u, A) = fe{u)W- ^ 

6. Shape- and transition curves 

There exist several alternative models for tvAR-processes - in particular models where 
specific characteristics of the time series are modeled by a curve. Below we give 4 examples 
where we restrict ourselves to tvAR(2)-models. Suppose we have a stationary AR(2)-model 
with complex roots ^ exp(z(/>) and ^exp(— i(/>), that is with parameters ai = — 2rcos((/>), 
02 = r^, and variance cr^. The corresponding process shows a quasi-periodic behavior with 
period of length that is with frequency (f). The more r gets closer to 1 the more the 
shape of the process gets closer to a sine-wave. The amplitude is proportional to a (if a 
(say in ^) is replaced by c • cr, then Xf is replaced by c • Xt). 

In the specific tvAR(2)-case we can now consider the following shape- and transition-models 
for quasi-periodic processes: 

(i) Model with a time varying amplitude curve: 

fli(")) '^2(") constant; a{-) time varying. 

Chandler and Polonik (2006) use this model with a unimodal cr(-) and a nonparametric 
maximum likelihood estimate for the discrimination of earthquakes and explosions. The 
properties of the estimator have been investigated in Dahlhaus and Polonik (2006). 

(ii) Model with a time varying frequency curve: 

ai(-) = — 2rcos (0(-))5 o,2{') = i""^ with r constant and (/>(•) time varying, cr(-) constant. 

The model in Figure 1 is of this form with r = 0.9 and (/>(n) = 1.5 — cos47rii. 

(iii) Model with a time varying period-distinctiveness: 

ai(-) = — 2r(-) cos(0), a2(") = '''(O^ with r(-) time varying and cj) constant, a{-) constant. 

(iv) Transition models: Amado and Terasvirta (2011) have recently used the logistic tran- 



sition function to model parameter transitions in GARCH-models. The simplest transition 
function is 

G(|,;7,c):=[l + exp{-7(^-c)}] 
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Since G{0; 7, c) ~ and G(l; 7, c) ~ 1 the model 



^ „start I /"'/'„,.., „\ /„end „startN ^ „start i r<(„,.^, „\ /^end „startN 

a\\u) = + ^\u\ 7, cj ya^^ ~ (^x j ) '3^2(,^^j = 0^2 + ^(^i 7) cj (^(22 ~ (^2 ) 

is a parametric model for a smooth transition from the AR-model with parameters (a^* 
at n = to the model with parameters (a^"*^, al"'^) at n = 1. Here c and 7 are the location 
and the 'smoothness' of transition respectively. More general transition models (in particu- 
lar with more states) may be found in Amado and Terasvirta (2011). G(-;7, c) may also be 
replaced by a (nonparametric) function G(-) with G'(O) = and G{X) = 0. 

It is obvious that all methods from subsection 5 can be applied in cases (i)-(iv) to estimate the 
constant parameters and the shape- and transition-curves. We mention that the theoretical 
results for local Whittle estimates of Dahlhaus and Giraitis (1998) apply to these models (cf. 



Example 3.6), the uniform convergence result for the local generalized Whittle estimate in 
Theorem 6.9 the asymptotic results of Dahlhaus and Neumann (2001) where the parameter 
curves are estimated by a nonlinear wavelet method, the results of Dahlhaus and Polonik 
(2006) on nonparametric maximum likelihood estimates under shape constraints, and the 



results for parametric models in Theorem 5.1 on the MLE and the generalized Whittle 



estimator, and in Dahlhaus (1997) on the block Whittle estimator. 



3 Local likelihoods, derivative processes and nonlinear models 
with time varying parameters 

In this section we present a more general framework for time series with time varying finite 
dimensional parameters 0{-) and show how nonparametric inference can be done and theoret- 
ically handled. Typically such models result from the generalization of classical parametric 
models to the time varying case. If we restrict ourselves to linear processes or even more to 
Gaussian processes then a much more general theory is possible which is developed in the 
subsequent sections. Large parts of the present section are based on the ideas presented in 
Dahlhaus and Subba Rao (2006) where time varying ARCH-models have been investigated. 

The key idea is to use at each time point no G (0, 1) the stationary approximation Xt{uQ) 
to the original process Xt^x and to calculate the bias resulting from the use of this approx- 
imation. This will end in Taylor-type expansions of Xt^T in terms of so-called derivative 
processes. These expansions play a major role in the theoretical derivations. 

Suppose for example that we estimate the multivariate parameter curve 6{-) by minimizing 
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the (negative) local conditional log-likelihood, that is 

c 



£?(no, 0):=l-j2\ ^(^^Sr^) V(^) (31) 



with 

rS^/n. ft^ ■ = 

b 

t=i 
and 

V(^) := - log /e {Xt^T\Xt-i,T, . . • , ^i,t) 

where is symmetric, has compact support [—5,2] ^^'^ fulfills J^y^ K{x) dx = 1. We 
assume that 6 = 6t — )• and 6T — t- c« as T — )• 00. Two examples for this likelihood are 
given below. 

We approximate Cj,{uo,6) with £j.{uo,6) which is the same function but with £t,T{(^) 
replaced by 

it{uQ,e) := -logf0{Xt{uo)\Xt-i{uo), . . .,Xi{uo)), 

which means that X^^t is replaced by its stationary approximation Xj(no). Usually this is 
the local conditional likelihood for the process Xt{uo)- 

Example 3.1 (i) Consider the tvAR(p) process defined in ^ together with its stationary 
approximation at time uq given by ([3| . Under suitable regularity conditions it can be shown 
that Xt,T = Xt{uo) + Op{\^ - uo\ + ^) (cf.(51)). In case where the et are Gaussian the 
conditional likelihood at time t is given by 

1 1^2 
it,T{e) = - log(27r a^) + (Xt,T + ^ a,- X^.,- t) (32) 

where = (ai, . . . , Op, o"^)'. It is easy to show that the resulting estimate is the same as 
in I?! but with fT(Mo) := [ct{uq,Q,1), ...,ct{uo,{),p))' and Rt{uq) := {cT{uo,i, j)}i,j=i,...,p 
with the local covariance estimator CT{u,i,j) as defined in (10). 

(ii) A tvARCH(p) model where {Xf^r} is assumed to satisfy the representation 

t ^ t 

where f^?,r = "0 ( ^ ) + X] ( ^ ) ^t-j,N for t = 1 , . . . , iV (33) 
with Zt being independent, identically distributed random variables with EZt = 0, = 1. 
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The corresponding stationary approximation Xt{uQ) at time uq is given by 

Xt{uo) = at{uo) Zt 

p 

where at{uo)'^ = ao{uo) + '^aj{uo) Xt-j{uo)'^ for t G Z. (34) 

i=i 

It is shown in Dahlhaus and Subba Rao (2006) that {X^j,} as defined above has an almost 
surely well-defined unique solution in the set of all causal solutions and X^j. = Xt{uo)'^ + 
Op{\j^ — uo\ + jl)- In case where the Zt are Gaussian the conditional likelihood is given by 



1. _^ 

2wt,T{0) 



,,T{e) = ^ logwt,T{e) + with wt,T{0) = ao + «j^'-i,T (35) 



where = (oq, . . . , Op)'. Dahlhaus and Subba Rao (2006) prove consistency of the resulting 
estimate also in case where the true process is not Gaussian. As an alternative Fryzlewicz 
et.al. (2008) propose a kernel normalized-least-squares estimator which has a closed form 
and thus has some advantages over the above kernel estimate for small samples. 



(iii) Another example is a tvGARCH(p,q)-process - see Example 3.9 □ 



We now discuss the derivation of the asymptotic bias, mean squared error, consistency and 
asymptotic normality of 6t{uq) for an "arbitrary" local minimum distance function Ct{uo, 6) 
(keeping in mind the above local conditional likelihood). The results are obtained by ap- 
proximating Ct{uq, 9) with Ct{uq, 6) which is the same function but with Xt^T replaced by 
its stationary approximation Xt{uQ). Typically both, Ct{uq, 6) and Ct{uq, 6) will converge 
to the same limit-function which we denote by C{uo, 0). Let 

^o('Uo) := argmin £(uo, 0). 
fee 

If the model is correctly specified then typically Oq{uq) is the true curve. Furthermore, let 

Bt{uo, 6) := Criuo, 9) - Ct{uo, 9). 

The following two results describe how the asymptotic properties of 9t{uq) can be derived. 
They should be regarded as a general roadmap and the challenge is to prove the conditions 
in a specific situation which may be quite difficult. 

Theorem 3.2 (i) Suppose that Q is compact with 9o{uo) G Int{Q), the function C{uq,9) 
is continuous in 9 and the minimum 9q{uq) is unique. If 

sup \Ct{uo, 9) — C{uo,9)\ —> 0, (36) 

060 
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and 



then 



snp\BTiuo,e)\^0 (37) 
0eG 



^t(uo) 4 6>o(no). (38) 



(ii) Suppose in addition that C{u, 0) and 6q(u) are uniformly continuous in u and and the 



convergence in (36) and (31) is uniformly in uq € [0, 1]. Then 



sup \eT{uo) - eoiuo)\ ^ 0. (39) 

MO e [0,1] 

Proof. The proof of (i) is standard - cf. the proof of Theorem 2 in Dahlhaus and Subba 
Rao (2006). The proof of (ii) is a straightforward generahzation. 



Note that in (i) all conditions apart from (37) are conditions on the stationary process Xt{uo) 
with (fixed) parameter 6{uq) and the stationary likelihood / minimum-distance function 
Ct{uq, 0). These properties are usually known from existing results on stationary processes. 



It only remains to verify the condition (37) which can be done by using the expansion (51 ) in 
terms of derivative processes (see the discussion below), (ii) contains a little pitfall: Usually 
the estimate Ot{uq) is defined for uq = or uq = 1 in a different way due to edge-effects. 
This means that also Ct{uo, 6) looks different, that is one would usually prefer a uniform 
convergence result for uq G (0, 1) which is more difficult to prove. 

Even more interesting and challenging is a uniform convergence result with a rate of con- 
vergence. For time varying AR(p)-processes this is stated for a different likelihood in Theo- 
rem [6]9] We mention that such a result usually requires an exponential bound and maximal 
inequalities which need to be tailored to the specific model at hand. 

We now state the corresponding result on asymptotic normality in case of second order 
smoothness. V denotes the derivatives with respect to the 6i, i.e. V := ^■ 

Theorem 3.3 Let Oq := Oq{u{)). Suppose that Ct{uq,0), Ct{uq,0) and C(uq,0) are twice 
continuously differentiable in 6 with nonsingular matrix r(uo) := V^£(no, ^o)- Let further 

VbT V£t(uo, Oo) ^ M{0, V{uo)) 

with some sequence h = bx where 6 — )• and 6T — )• cxd (the definition of b is part of the 
definition of the likelihood - it is usually some bandwidth) and 

sup IV^Criuo, e) - V'^CiuQ, 0) I 4 0. 
6»e0 
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// in addition 



bT[r{uo)-^ VBt{uo, Oo) - - /lO(uo) ) = Op(l) 



with some fJ-^i-) (to be specified below - cf.{4T)) and 



then 



sup\V^13T{uo,e)\ 4o 
0ee 



Proof. The usual Taylor-expansion of VCTiuQ, 0) around Oq yields 

Vbf {dTiuo) - 6>o + r{uo)-^VBTiuo, Oo)) = -Vbf r(no)-' V£tK, Oq) + Op{l). 
The result then follows immediately. 



(40) 



(41) 



(42) 



(43) 



Remark 3.4 (i) Again the first two conditions are conditions on the stationary process 
Xt{uQ) with (fixed) parameter 6{uo) and the stationary likelihood / minimum-distance func- 
tion Ct{uo, 0) which are usually known from existing results on stationary processes. 

(ii) Of course an analogous result also holds under different smoothness conditions and with 



other rates than b in (40) and (42). 



(iii) Under additional regularity conditions one can usually prove that the same expansion 



as in (43) also holds for the moments, leading to 

52 

E OTiu^) = Ooiuo) - - ti\uo) + o{b^) 



and 



var(0T(no)) = ^ Tiuo)-' V{uo) T{uo)~' + 



(44) 



(45) 



(note that ( 43 ) is a stochastic expansion which does not automatically imply these moment 
relations) . The proof of these properties is usually not easy. □ 

Example 3.5 (Kernel-type local likelihoods) We now return to the local conditional 



likelihood (31) as a special case and provide some heuristics on how to calculate the above 
terms (in particular the bias fi^[uQ)). We stress that in the concrete situation where a 
specific model is given the exact proof usually goes along the same lines but the details may 
be quite challenging. 
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Suppose that the local likelihood of the stationary process Xt{uo) converges in probability 
to 

£{uo,e) := lim £t{uo,0) = lim Bit{uo,0). 
Usually we have Xt^T = ^t{t/T) + Op{T^'^) and 

EVVW = EV^;(^,0) + o((6r)-i/2) = v£(^,0) + o((6r)-i/2) 

uniformly in t. A Taylor-expansion then leads in the case 

63 = o((6r)-i/2) with the sym- 
metry of the kernel K to 



T 



hT 

t=i 

+ 



b 

d_ 
du 



no - t/T\ ( t 



t=i 



t=i 



b J \T 
uq — t/T\ / t 



uo] + 



VC{uo, 0) + \b^ dK |^V£(no, e) + o{{bT)-^l^) 



(46) 



with dx ■■= f x'^K{x)dx. Since EV£t(mo,0) = V£(tio,0) + o{{bT)-^/'^) this leads with 
(40) to the bias term 



(47) 



Let 00 '■= ^o{uq). If the model is correctly specified it usually can be shown that Vit{uo, 6o) 
is a martingale difference sequence and the condition of the Lindeberg martingale central 
limit theorem are fulfilled leading to 

Vbf V£T(no, eo) ^ m(o, vk E(V£;(no, ^o)) (V£~t(^^o, ^o))') 

with Vk = / K{x)^dx. Furthermore, if the model is correctly specified we usually have 

E(V^;(uo, Oo)) {Vitiuo, eo))' = v2£(uo, ^o) = r(uo) 

that is 

Vbf (eriuo) - 6>o(no) + ^ dx r{uo)-^ ^VC{uo, Oq)^ ^ vk r(no)-') . (48) 
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If we are able to prove in addition the formulas (44) and (45) on the asymptotic bias and 



variance we obtain the same formula for the asymptotic mean squared error as in (11 ) with 
t(uo) = tr{r(no)~n and;u(uo)^ replaced by ||/x(no)|p where n{uo) = T{uo)~'^ ^VC{uo,9o). 
As in Remark 2.2 this leads to the optimal segment length and the optimal mean squared 

□ 



error. The implications for non-rescaled processes are the same as in Remark |2.3[ 
We now present three examples where the above results have been proved explicitly. 



Example 3.6 (Local Whittle estimates) The first examp 



on segments Oj^ (uq) obtained by minimizing £.^(uq,9) (cf.(18)). In case of a tvAR(p)- 



e are local Whittle estimates 



process Oj^ (uq) is exactly the local Yule- Walker estimate defined in (7) with the covariance- 
estimates given in (|8|. Cj^{u,9) is not exactly a local conditional likelihood as defined in 



(31) but approximately (in the same sense as CT{uo,k) from (|8|) is an approximation to 



the kernel covariance estimate). For that reason the above heuristics also applies to this 
estimate and can be made rigorous. 

In Dahlhaus and Giraitis (1998), Theorem 3.1 and 3.2, bias and asymptotic normality of 
9rp [uq) have been derived rigorously including a derivation of the variance and the mean 



squared error as given in (44) and (45) (i.e. not only the stochastic expansion in (43)). We 



mention that therefore also the results on the optimal kernel and bandwidth in (12) and 



(13) apply to this situation. 



In the present situation we have (cf. Dahlhaus and Giraitis (1998), (3.7))) 



C{u,9) 



La 



log 47r2 /0(A) + 



fiu,X) 



dX. 



Vfe{Xr^^f{uo,X) dX 



Therefore 

and in the correctly specified case where /(n, A) = /6»(,(«)(A) 

T{uo) = V^C{uo,9o) = ^ r {V \og fe,){V log fg,)' dX 



leading to the asymptotic bias n{uo) in (47) and the asymptotic variance in th e cen tral limit 

I I — ' " 

theorem (48). A uniform convergence result for 9rp [uq) is stated in Theorem 



6.9 



□ 



Example 3.7 (tvAR(p)-processes) In the special case of a Gaussian tvAR(p)-process 
the exact results for the local Yule- Walker estimates ([T]) follow as a special case from the 
above results on local Whittle estimates (see also Section 2 in Dahlhaus and Giraitis, 1998, 
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where tvAR(p)-processes are discussed separately). In that case we have with R{u) and 
r{u) as in (6) that r(u) = ^p^^R{u). Furthermore 

VC(u,e) = ^[R{i 



[u] a 



+ r(n)] 



which imphes 



/i(no) = R{uo) 



R{u)]cy.{uQ) + 



du' 



■r{u) 



U=Uq 



We conjecture that exactly the same asymptotic results hold for the conditional likelihood 
estimate obtained by minimizing 



■■=-y- 



K 



uo - t/T 



t=i 



1 



log(2^(j^) + 



1 



2a^ 



□ 



We now introduce derivative processes. The key idea in the proofs of Dahlhaus and Giraitis 
(1998) is to use at time uq G (0,1) the stationary approximation Xt{uo) (there denoted 
by It) to the original process Xf^T and to calculate the bias resulting from the use of this 
approximation. As in Dahlhaus and Subba Rao (2006) we now extend this idea leading 
to the Taylor- type expansion (51) which is an expansion of the original process in terms 
of (usually ergodic) stationary processes called derivative processes. This expansion is a 
powerful tool since all techniques for stationary processes including the ergodic theorem 
may be applied for the local investigation of the nonstationary process Xf^T- The use of this 
expansion and of derivative processes in general leads to a general structure of the proofs 
and simplifies the derivations a lot. 

We start with the simple example of a tvAR(l)-process since in this case everything can 
be calculated directly. Then Xt^T is defined by Xt^T + ctiit/T)Xt^i^T = et,t £ Z and the 
stationary approximation Xt{uo) at time uq = to/n by Xt{uo) + ai{uQ)Xt{uo) = et,t G Z. 
Repeated plug-in yields under suitable regularity conditions (for a rigorous argument see 
the proof of Theorem 2.3 in Dahlhaus (1996a)) 



Xt,T 



oo 



Xt 



-1)1 n 



T 



k=0 



+ o. 



1 

f 



t - k 
T 



j=0 



£t-j 



Or 



Xt{uo) + 



UQ 



dXtju) 
du ' 



\ u=ua 



+ 0, 



(49) 



(50) 



We have in the present situation 
dXtiu 



du 



j=0 



^{-iy[ja,{uy-'a,{uy] st-, 

j=0 
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that is ^^'^"^ is a stationary ergodic process in t with 
IpI < 1. In the same way we have 

Xt,T 



dXtiu) 
du 



Xt{uQ) + 

+ 0. 



t \ dXt{u 
1 J ou 

t 

T 



^0) +^ 



+ 



1 / 1 



2 \T 



Uo 



< Ejli^y ^kt-jl where 
^d^Xtiu), 



J U=Ui} 



(51) 



with the second order derivative process ^ ^i"^ ] u=uq which is defined analogously. It is not 
difficult to prove existence and uniqueness in a rigorous sense. 

For general tvAR(p)-processes the same results holds - however, it is difficult in that case 
to write the derivative process in explicit form. It is interesting to note that the derivative 
process fulfills the equation 



dXt{u) 
du 



da{u) 
du 



where a'j{u) denotes the derivative of Qj(n) with respect to u. This is formally obtained by 
differentiating both sides of equation Furthermore, it can be shown that this equation 
system uniquely defines the derivative process. 



We are convinced that the expansion (51) and equation systems like (52) can be established 



for several other locally stationary time series models. As mentioned above the important 



point is that (51 ) is an expansion in terms of stationary processes. 



In the next example we show how derivative processes are used for deriving the properties 
of local likelihood estimates. 

Example 3.8 (tvARCH-processes) The definition of the processes Xt^r and Xtiu^) has 



been given above in (^33^ and (34) and of the local likelihood in (35) and (31). In Dahlhaus 



and Subba Rao (2006), Theorem 2 and 3, consistency and asymptotic normality have been 



established for the resulting estimate and in particular (48) has been proved. Derivative 



processes play a major role in the proofs and we briefly indicate how they are used. First, 
existence and uniqueness of the derivative processes have been proved including the Taylor- 
type expansion for the process X^rp-. 



XIt = Xt{uof + 



t 



T 
+ 0p 



. dX,{uf , 1 



x^di^Xt^uf 



(52) 
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(in this model we are working with X^rp rather than Xt^T since X'frp is uniquely determined). 
Furthermore, ^'^q'^^ is almost surely the unique solution of the equation 



dXtjuf 
du 



du 



(53) 



which can formally be obtained by differentiating (34). By taking the second derivative of 



this expression we obtain a similar expression for the second derivative 



etc. 



A key step in the above proofs is the derivation of (40) and of the bias term /x''(-) in this 
situation. We briefly sketch this. We have with 6q = 6q{uo) 

VSHno,^o) = ^Ei^^"°-'/^ 



(VV(^o)- V£t(no,0o)). 



t=i 



First V£t,T{Go) is replaced by V4(t/T, 0o) where we omit details (this works since X^j. is 
approximately the same as Xf{t/T)). Then a Taylor-expansion is applied: 



J u=uo 



+ 



1 / 1 



6 \T 



Uq 



d^vet{u,eo 

du^ 



(54) 



with a random variable IJt G (0, 1]. The breakthrough now is that can be written 

explicitly in terms of the derivative process ^"^g^"^ and of the process Xj(n)2, that IS we 
obtain with the formula for the total derivative 



dVit{u,eo 
du 



E 

j=0 



d 



dXt-jiu) 



Vwt{u,9o) _ Xt{ufVwtiu,eo) 
wt{u,Oo) wtiu^OoY 



dXt-jjuf 
du 



where Wt{u,6) = Co{6q) + '^'jLi'^j{^)Xt-j{u)'^ (the same holds true for the higher order 
terms). In particular is a stationary process with constant mean. Due to the sym- 

metry of the kernel we therefore obtain after some lengthy but straightforward calculations 



bT(T{uo)-^VBT{uo,9o) - ^d^r(no)-^ ^V£(n,0o)J«= 



Op(l). 



(55) 



A very simple example is the tvARCH(O) process 
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In this case 



ag(n) Zl and we have 



J U=Uq 



1 Qo(^o) 

2 ao(iio)^ 



and S(uo) 



2ao(uo)^ 



that is ^(■uq) = — aQ(tio)- This is another example which illustrates how the bias is linked 
to the nonstationarity of the process - if the process were stationary the derivatives of ao{-) 



would be zero causing the bias also to be zero. The formula (13) for the optimal bandwidth 
leads in this case to 



bopt{uo) 



'2VK 


1/5 


ao('"o) 









2/5 



J.-1/5 



leading to a large bandwidth if ag('Uo) is small and vice versa. As in Remark 2.3 this can 
be "translated" to the non-rescaled case. □ 



Example 3.9 (tvGARCH-processes) A tvGARCH(p, q)-pTocess satisfies the following 
representation 



t ^ t " t 

where alj, = oq ( - ) + ^ ( - ) X^_j j, + ^ ft ( - ) CT^^.i^T > 

j=i i=i 



(56) 



where {Zt} are iid random variables with EZt = and = 1. The corresponding 

stationary approximation at time uq is given by 

Xt{uo) = atiuo) Zt for t E Z 

p Q 
where tTt(^o)^ = Qo(no) + y^^aj{uo) Xt-jjupf + y^/3i(no) at-ijuof. (57) 

j=i i=i 

Under the condition that sup^ ( Sj=i o^il^) + Z]i'=i ft(^)) < 1 Subba Rao (2006), Section 5, 
has shown that X^j, = (ito)^ + Op(| —Uo\ + To obtain estimators of the parameters 
{aj{-)} and {/3i(-)} approximation of the conditional quasi-likelihood is used, which is 
constructed as if the innovations {Zt} were Gaussian. As the infinite past is unobserved, an 
observable approximation of the conditional quasi-likelihood is 

1 X'^ 

et,T{0) = - log wt, Tie) + ^ ^ _ 'f"^^^ with wt,Tio) = co{e) + ^ c, (0)x,%.^, (58) 



2wt,Tie) 



where a recursive formula for Cj{6) in terms of the parameters of interest, {aj} and {ft}, 
can be found in Berkes et.al. (2003). Given that the derivatives of the time varying GARCH 
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parameters exist we can formally differentiate (57) to obtain 



du 
du 



datjuY 
du 

ao(u) + 



p 

E 

i=i 



a'Au) Xt-j{u)'^ + OLj{u) 



du 



dcFt- 



du 



Subba Rao (2006) has shown that one can represent the above as a state-space representation 
which almost surely has a unique solution which is the derivative of with respect to 



u. Thus X'^j, satisfies the expansion in (52). Moreover, Fryzlewicz and Subba Rao (2011) 
show geometric a-mixing of the tvGARCH process. Using these results and under some 



technical assumptions it can be shown that Theorem |3.2[ i) and Theorem 3.3 hold for the 
local approximate conditional quasi-likelihood estimator. In particular, a result analogous 



to (55) holds true, where 



oo 

c{u,e) = E(iog (co(0) + Y,(^j{e)Xt-j{u 



+ E 



Xt{uf 



co(0) + Er=iC,(0)X,_,(^)^ 



Amado and Terasvirta (2011) investigate parametric tvGARCH-models where the time vary- 
ing parameters are modeled with the logistic transition function - see Section |2]6. □ 

Similar methods as described in this section have also been applied in Koo and Linton 
(2010) who investigate semiparametric estimation of locally stationary diffusion models. 



They also prove a central limit theorem with a bias term as in [42). In their proofs they 



use the stationary approximation Xt{uQ) and the Taylor-type expansion (51). Vogt (2011 



investigates nonlinear nonparametric models allowing for locally stationary regressors and a 
regression function that changes smoothly over time. 



4 A general definition, linear processes and time varying spec- 
tral densities 

The intuitive idea for a general definition is to require that locally around each rescaled 
time point uq the process {Xi^t} can be approximated by a stationary process {X((uo)} in 
a stochastic sense by using the property (Q (cf. Dahlhaus and Subba Rao, 2006). Vogt 
(2011) has formalized this by requiring that for each uq there exists a stationary process 
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Xt{uo) with 



1 



\Xt,T - Xt{uo)\ < ( - - no + - j Ut,T{uo) (59) 



t 

f 

where C/t,r(^o) is a positive stochastic process fulfilhng some uniform moment conditions. 
However up to now no general theory exists based on such a general definition. 

In the following we move on towards a general theory for linear locally stationary processes. 
In some cases we even assume Gaussianity or use Gaussian likelihood methods and their 
approximations. In this situation a fairly general theory can be derived in which parametric 
and nonparametric inference problems, goodness of fit tests, bootstrap procedures etc can 
be treated in high generality. We use a general definition tailored for linear processes which 



however implies (59). 



Definition of linear locally stationary processes: We give this definition in terms of 
the time varying MA(oo) representation 

t °° t 

Xt,T = fJ-i?^) + ^ at.T{i)et-j where at^rU) ^ a{-, j) 

j=~oo 

with coefficient functions a{-,j) which need to fulffil additional regularity function (depen- 
dent on the result to be proved - details are provided below) . In several papers of the author 
instead the time varying spectral representation 

Xt^T = f^d) + 4^ r exp(at) At,TW de(A) where At,T{X) « ^(;^, A) (60) 
J v2vr J-Tt J 

with the time varying transfer function A{-, A) was used. Both representations are basically 



equivalent - see the derivation of ( 78 ) . In the results presented below we will always use the 



formulation "Under suitable regularity conditions ..." and refer the reader to the original 



paper. We conjecture however that all results can be reproved under Assumption 4.1 We 
emphasize that this is not an easy task since in most situations it means to transfer the proof 
from the frequency to the time domain. In that case it would be worthwhile to require only 
martingale differences £t since also some nonlinear processes admit such a representation. 

Let 

m 

V{g) = sup I ^ \g{xk) - g{xk-i)\ : < Xo < . . . < < 1, m e n| (61) 

k=l 

be the total variation of g. 

Assumption 4.1 The sequence of stochastic processes Xt_T has a representation 

t °° 

Xt,T = f-i{^)+ Yl ^t,T{j)et-j (62) 

j=~oo 
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where fj, is of bounded variation and the et are iid with Eet = 0, EegEt = for s ^ t, 



Eel = l. Let 



1, IjI < 1 

|j|>l 



for some k > and 



sup \at;r{i)\ < 



K 



(with K indep. of T). 



Furthermore we assume that there exist functions a{-,j) : (0, 1] — >■ R with 

sup \a[u,j)\ < 



t 



sup^ |at,T(i) - a{-,2 
V'(a(-,J)) < 



K 

wr 



(63) 

(64) 

(65) 
(66) 



The above assumptions are weak in the sense that only bounded variation is required for the 
coefficient functions. In particular for local results stronger smoothness assumptions have 
to be imposed - for example in addition for some i 



sup 

u 



sup 



d'a{u,j] 



< 



K 



for j = 0, 1, . 



and instead of ( 65 ) the stronger assumption 



sup 



< 



K 



T£{j) 



(67) 
(68) 

(69) 



The construction with at,r(j) and a(fi)j) looks complicated at first glance. The function 
a{-,j) is needed for rescaling and to impose necessary smoothness conditions while the 
additional use of at^rij) makes the class rich enough to cover interesting cases such as tvAR- 
models (the reason for this in the AR(l)-case can be understood from (49)). Cardinali and 
Nason (2010) created the term close pair for (^a{j^, j), at^T{j)) ■ Usually, additional moment 
conditions on et are required. 

It is straightforward to construct the stationary approximation and the derivative processes. 
We have 



Xt{u) := n{u) + ^ a{u,j)et- 
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and 



]=-oo 



and it is easy to prove (59 ) and more general the expansion (51 ). We define the time varying 
spectral density by 

/(n,A):=^|^(n,A)|' (70) 

where 

oo 

A{u,X):= ^ a(u, j) exp(-iAj), (71) 

j=-oo 

and the time varying covariance of lag k at rescaled time u by 

/(u, A) exp(iAfc) = a{u,k + j) a{u,j). (72) 

f{u,X) and c{u,k) are the spectral density and the covariance function of the stationary 



approximation Xt{u). Under Assumption 4.1 and (69) it can be shown that 



coy{X[uT],t, XiuT]+k,T) = c{u, k) + 0{T ^) (73) 

uniformly in u and k - therefore we call c(n, k) also the time varying covariance of the 
processes Xt^T- In Theorem 4.3 we show that f[u, A) is the uniquely defined time varying 
spectral density of Xt^x- 

Example 4.2 (i) A simple example of a process Xt^T which fulfills the above assumptions 
is Xt^T = /"(fi) + 4^{^)Yt where If = T,j a{j) et-j is stationary with \a{j)\ < K/£{j) and 
/u and (j) are of bounded variation. If Yj is an j4i?(2)-process with complex roots close 
to the unit circle then Yt shows a periodic behavior and 0(-) may be regarded as a time 
varying amplitude modulating function of the process Xf^T- niay either be parametric 
or nonpar ametric. 

(ii) The tvARMA(p,q) process 

E = E '^(^) '^-'^ (74) 

i=o fc=o 

where et are iid with Est = and Ee^ < oo and all aj{-), /3fc(-) and (t^(-) are of bounded 
variation with ao(-) = /3o(") = 1 and Sj=o Oij{u)z^ ^ for all u and all \z\ < 1 + 6 for some 
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5 > 0, fulfills Assumption |4.1| If the parameters are differentiable with bounded derivatives 



then also (67)- (69) are fulfilled (for i=l). The time varying spectral density is 



;(.^^^^,)_ ^'HI^=o/?fe(^)exp(^Afc)p ^ ^^^^ 
27r I Yl%o aj{u) exp{iXj)\'^ ' 

This is proved in Dahlhaus and Polonik (2006). aj{-) and /3fc(-) may either be parametric 
or nonpar ametric. □ 



The time varying MA(oo)-representation (62) can easily be transformed into a time varying 



spectral representation as used e.g. in Dahlhaus (1997, 2000). If the et are assumed to be 
stationary then there exists a Cramer representation (cf. Brillinger (1981)) 

et = [ exp{iXt) d^{X) (76) 



^27r 

where ^(A) is a process with mean and orthonormal increments. Let 



At,Ti\) := Yl "'^'TU) exp(-iAj). (77) 



j=-oo 



Then 



( 69 ) now implies 



Xt,T = r exp(iAt) At^rW dC{X). (78) 

v27r J-n 



supUt,T(A) - A(^,A)| < KT-i (79) 



which was assumed in the above cited papers. Conversely, if we start with (78) and (79) 



then we can conclude from adequate smoothness conditions on A{u, A) to the conditions of 



Assumption 4.1 



We now state a uniqueness property of our spectral representation. The Wigner-Ville spec- 
trum for fixed T (cf. Martin and Flandrin (1985)) is 

^ oo 

/t(m,A):=— cov(X[„r_g/2],r,^[«T+s/2],T) exp(-zAs) 

s=— oo 



with Xt^T as in (62) (either with the coefficient extended as constants for u ^ [0, 1] or set 



to 0). Below we prove that fT{u,X) tends in squared mean to f{u,X) as defined in (70). 
Therefore it is justified to call f{u, A) the time varying spectral density of the process. 
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Theorem 4.3 If Xt^T is locally stationary and fulfills Assumption ^.i and (68) for all j 
then we have for all u € (0, 1) 

\fTiu,X)-fiu,X)\^dX = oil). 

Proof. The result was proved in Dahlhaus (1996b) under a different set of conditions. It 
is not very difficult to prove the result also under the present conditions. □ 

As a consequence the time varying spectral density f{u, A) is uniquely defined. If in addition 
the process Xt^T is non-Gaussian, then even A{u, A) and therefore also the coefficients a{u, j) 
are uniquely determined which may be proved similarly by considering higher-order spectra. 
Since fi{t/T) is the mean of the process it is also uniquely determined. This is remarkable 
since in the non-rescaled case time varying processes do not have a unique spectral density 
or a unique time varying spectral representation (cf. Priestley (1981), Chapter 11.1; Melard 



and Herteleer-de Schutter (1989)). f{u,X) from Theorem 4.3 has been called instantaneous 
spectrum (in particular for tvAR-process - cf. Kitagawa and Gersch (1985)). The above 
theorem gives a theoretical justification for this definition. 

There is a huge benefit from having a unique time varying spectral density. We now give 
an example for this. We derive the limit of the Kullback-Leibler information for Gaussian 
processes and show that it depends on f{u, A). Replacing this by a spectral estimate will lead 
to a quasi likelihood for parametric models similar to the Whittle likelihood for stationary 
processes. Without a unique spectral density such a construction were not possible. 

Consider the exact Gaussian maximum likelihood estimate 

fi^^ := argmin£^(ry) 



where rj is a finite-dimensional parameter (as in (20)) and 



iv) = \ log(2vr) + ^ log det + ^ (X - /.^)'S-i (X - /.,) (80) 

with X = {Xi^T, ■ ■ ■ ■,XT,Ty ■, = {l^rii^/T), . . . , firi{T/T)y and being the covariance 
matrix of the model. Under certain regularity conditions fj^^^ will converge to 

r]o := argmin£(r/) (81) 



where 



£(r/) := lim BC^{ri). 

T— ^oo 
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If the model is correct, then typically ijq is the true parameter value. Otherwise it is some 
"projection" onto the parameter space. It is therefore important to calculate C{r]) which is 
equivalent to the calculation of the Kullback-Leibler information divergence. 



Theorem 4.4 Let Xt^T be a locally stationary process with true mean- and spectral density 
curves ^{■), f{u,X) and model curves ^n{'), frjiu, X) respectively. Under suitable regularity 
conditions we have 



£(r/) = lim E£^(7?) 

T— !>oo 

= / 1 log4^ ^(^'A) + — — -\d\du+— ^ —-—^du. 

Proof. See Dahlhaus (1996b), Theorem 3.4. 

The Kullback-Leibler information divergence for stationary processes is obtained from this 
as a special case (cf. Parzen (1983)). 

Example 4.5 Suppose that the model is stationary, i.e. /r;(A) := frjiu, A) and m := /x^(u) 
do not depend on u. Then 

i.e. mo = Jq fi{u)du, and f-qoi^) give the best approximation to the time integrated true 
spectrum f(u, A) du. These are the values which are "estimated" by the MLE or a quasi- 
MLE if a stationary model is fitted to locally stationary data. □ 



Given the form of C{ri) as in Theorem 4.4 we can now suggest a quasi-likelihood criterion 
^Tiv) = ^ / i log47r u,A + — — -\dXdu+— ^ ' du 

47ryo J-TT fviuA)J 4:TT Jq fr,{u,0) 

where f{u, A) and fi{u) are suitable nonparametric estimates of f{u, A) and /u(n) respectively. 



The block Whittle likelihood £^^(r/) in (21 ) and the generalized Whittle likelihood Cj,^{ri) 



in (89) are of this form. 



We now calculate the Fisher information matrix 

r := lim TE,„ (V£f (%)) (V£f (r/o))' 

1 — >oo 



in order to study efficiency of parameter estimates (see also Theorem 5.1 1. 
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Theorem 4.6 Let Xt^T be a locally stationary process with correctly specified mean curve 
^r]{u) and time varying spectral density /^(n, A). Under suitable regularity conditions we 
have 



^=1^1 f_ (Vlog/,o)(Vlog/,j'dAdn+^^ (VM,„(n))(V/i,„(^z))7,'oH",0)(i«. 
Proof. See Dahlhaus (1996b), Theorem 3.6. 

We now briefly discuss how the time varying spectral density can be estimated. Following 
the discussion in the last section we start with a classical "stationary" smoothed periodogram 
estimate on a segment. Let /t(w, A) be the tapered periodogram on a segment of length 
about u as defined in ( |19| ). Even in the stationary case It{u, A) is not a consistent estimate 
of the spectrum and we have to smooth it over neighboring frequencies. Let therefore 

fT{u, ^)--=yJ Kf (^) It{u, fi) dfi (82) 
where Kj is a symmetric kernel with J Kf{x) dx = 1 and 6/ is the bandwidth in frequency 



direction. Theorem 5.5 below shows that the estimate is implicitly also a kernel estimate in 
time direction with kernel 

Kt{x) -.= 1^1^ h{xfdx^ /i(x + 1/2)2, xG [-1/2, 1/2] (83) 

and bandwidth ht ■= N/T, that is the estimate behaves like a kernel estimates with two 
convolution kernels in frequency and time direction. We mention that an asymptotically 
equivalent estimate is the kernel estimate 

2tt f 1 (u-t/T\ 1 /A-Aa /t 



t=i j=i 



hiu,x) iM^} v/<f{^) Hf-^') («^) 



with the pre-periodogram Jj'(u, A) as defined in (88). One may also replace the integral in 



frequency direction in (82) by a sum over the Fourier frequencies. 



Theorem 4.7 Let Xt^T be a locally stationary process with /x(-) = 0. Under suitable regu- 
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larity conditions we have 

(i) E/r(n, A) = f{u, X) + \ bf j"^^^ x'Kt{x) dx ^f{u, A) + 0(6?) + o(i^^^ 
(u) Bfriu, A) = f{u, A) + ^ 5? ^'^^ ^f{u, A) 

+ i 6^ dx^/(., A) + 0(6? + 6^ + i^^^^) 



(m) vai{fTiu,X)) = {btbfTy'27rf{u,Xf Kt{xfdx Kf{xfdx{l + 5xo)- 

J-1/2 7-1/2 

Proof. A sketch of the proof can be found in Dahlhaus (1996c), Theorem 2.2. 

Note, that the first bias term of / is due to nonstationarity while the second is due to the 
variation of the spectrum in frequency direction. 

As in Remark 



2.2 



one may now minimize the relative mean squared error RMSE(/) : = 
£'(/(«, A)//(u, A) — 1)^ with respect to bf, bt (i.e. A^), Kf and Kt (i.e. the data taper h). 
This has been done in Dahlhaus (1996c), Theorem 2.3. The result says that with 

Au:=^f{u,X)/f{u,X) and Ax := -^f{u, X) / f{u, X) 
the optimal RMSE is obtained with 

1/12 /^^X 1/12 

t 

and optimal kernels Kf^^{x) = K^^^{x) = 6 (l/4 — x^) with optimal rate T~^/^. 



^opt ^ 2^-1/6(576^)1/6 ^ ^opt ^ j.-i/6(57g^)i/6 (^^'^ 



The relations bt = N/T and (83) immediately lead to the optimal segment length and 
the optimal data taper h. The result of Theorem 5.5 is quite reasonable: If the degree of 
nonstationarity is small, then is small and 6°^^ gets large. If the variation of / is small 
in frequency direction, then is small and b^^^ gets smaller (more smoothing is put in 
frequency direction than in time direction). This is another example, how the bias due to 
nonstationarity can be quantified with the approach of local stationarity and balanced with 
another bias term and a variance term. Of course the data-adaptive choice of the bandwidth 
parameters remains to be solved. Asymptotic normality of the estimates can be derived from 
Theorem [a3l(cf. Dahlhaus (2009), Example 4.2). 
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Rosen et.al. (2009) estimate the logarithm of the local spectrum by using a Bayesian mixture 
of splines. They assume that the log spectrum on a partition of the data is a mixture of 
individual log spectra and use a mixture of smoothing splines with time varying mixing 
weights to estimate the evolutionary log spectrum. Guo et.al. (2003) use a smoothing spline 
ANOVA to estimate the time varying log spectrum. 



5 Gaussian likelihood theory for locally stationary processes 

The basics of the likelihood theory for univariate stationary processes were laid by Whittle 
(1953, 1954). His work was much later taken up and continued by many others. Among the 
large number of papers we mention the results of Dzhaparidze (1971) and Hannan (1973) 
for univariate time series, Dunsmuir (1979) for multivariate time series and e.g. Hosoya 
and Taniguchi (1982) for misspecified multivariate time series. A general overview over this 
likelihood theory and in particular Whittle estimates for stationary models may be found in 
the monographs Dzhaparidze (1986) and Taniguchi and Kakizawa (2000). 

From a practical point of view the most famous outcome of this theory is the Whittle 
hkehhood 

^ r/log4vrV,(A) + ^ldA (85) 



as an approximation of the negative log Gaussian likelihood (80) where It (A) is the peri- 
odogram. This likelihood has been used also beyond the classical framework - for example 
by Mikosch et al. (1995) for linear processes where the innovations have heavy tailed dis- 
tributions, by Fox and Taqqu (1986) for long range dependent processes and by Robinson 
(1995) to construct semiparametric estimates for long range dependent processes. 

The outcome of this likelihood theory goes far beyond the construction of the Whittle 
likelihood. Its technical core is the theory of Toeplitz matrices and in particular the approx- 
imation of the inverse of a Toeplitz matrix by the Toeplitz matrix of the inverse function. 
It is essentially this approximation which leads from the ordinary Gaussian likelihood to 
the Whittle likelihood. Beyond that the theory can be used to derive the convergence of 
experiments for Gaussian stationary processes in the Hajek-Le Cam sense, construct the 
properties of many tests and derive the properties of the exact MLE and the Whittle esti- 
mate (cf. Dzhaparidze (1986); Taniguchi and Kakizawa (2000)). 

For locally stationary processes it turns out that this likelihood theory can be generalized 
in a nice way such that the classical likelihood theory for stationary processes arises as a 
special case. Technically speaking this is achieved by a generalization of Toeplitz matrices 
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tailored especially for locally stationary processes (the matrix Ut{(I)) defined in (92)). 
Some results coming from this theory have already been stated in Section HI namely the 



limit of the Kullback-Leibler information divergence in Theorem 4.4 and the limit of the 
Fischer information in Theorem 14.61 We now describe further results. We start with a 
decomposition of the periodogram leading to a Whittle-type likelihood. We have 

1 I ^ 2 

^t(A) = — ^X^exp(-ar) 



=1 



1 \ 

- ^ XtXt+\k\ exp(-a/c) (86) 
t=i / 

y^ ,, -'^[t+o.5+fc/2],T-'^[t+o.5-fc/2],T exp(-iAfc) 

l<[t+0.5+fc/2],[i+0.5-/c/2]<T 

-Y,JtC-,\). (87) 
t=i 




where the so-called pre-periodogram 

Jt(u, A):=— -'^[tiT+o.5+fe/2],T-'^[«T+o.5-fc/2],T exp(-iA/i;) 

l<[uT+Q.b+k/2\,[uT+0.5-k/2]<T 

may be regarded as a local version of the periodogram at time t. While the ordinary 
periodogram It{^) is the Fourier transform of the covariance estimator of lag k over the 
whole segment (see (86)) the pre-periodogram just uses the pair ^[t+o.5+fc/2]-'^[t+o.5-fc/2] 
a kind of "local estimator" of the covariance of lag k at time t (note that [t + 0.5 -|- A;/2] — [t -|- 
0.5 — k/2\ = A;). The pre-periodogram was introduced by Neumann and von Sachs (1997) 
as a starting point for a wavelet estimate of the time varying spectral density. The above 
decomposition means that the periodogram is the average of the pre-periodogram over time. 



If we replace It (A) in (85) by the above average of the pre-periodogram and afterwards 
replace the model spectral density /r;(A) by the time varying spectral density fn{u, A) of a 
nonstationary model, we obtain the generalized Whittle likelihood 



If the fitted model is stationary, i.e. /,;(n. A) = /jy(A) then (due to (87)) the above likelihood 
is identical to the Whittle likelihood and we obtain the classical Whittle estimator. Thus the 
above likelihood is a true generalization of the Whittle likelihood to nonstationary processes. 
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In Theorem |5.4| we show that this hkehhood is a very close approximation to the Gaussian 
log likelihood for locally stationary processes. In particular (we conjecture that) it is a better 
approximation than the block Whittle likelihood [ri) from (21). 

We now briefly state the asymptotic normality result in the parametric case. An example 
is the tvAR(2)-model with polynomial parameter curves from Section [2]4. Let 

f^f^ := argmin£^^(ry) (90) 



be the corresponding quasi likelihood estimate, "f)^ be the Gaussian MLE defined in (80), 
and r/o as in (81) i.e. the model may be misspecified. 

Theorem 5.1 Let Xt^T be a locally stationary process. Under suitable regularity conditions 
we have in the case fi(-) = fi-qi') = 

Vf[f,^'^ - r?o) ^ AA(0, r- Vr-i) and Vr(r}|^^ - ^ AA(0, T-^VT-^) 

with 

du 



and 



rii = ^j[ j\U-fv.)^^jfvod^du+^j^ y"^(V,log/,o)(V,log/,.,)dA 



If the model is correctly specified then V = T and T is the same as in Theorem 4-6 - that is 
both estimates are asymptotically Fisher-efficient. Even more the sequence of experiments is 
locally asymptotically normal (LAN) and both estimates are locally asymptotically minimax. 

Proof. See Dahlhaus (2000), Theorem 3.1. LAN and LAM has been proved for the MLE in 
Dahlhaus (1996b), Theorem 4.1 and 4.2 - these results together with the LAM-property of 
the generalized Whittle estimate also follow from the technical lemmas in Dahlhaus (2000) 
(cf. Remark 3.3 in that paper). □ 

The corresponding result in the multivariate case and in the case /x(-) 7^ or /x^(-) 7^ can 
be found in Dahlhaus (2000), Theorem 3.1. 

A deeper investigation of £^^(77) reveals that it can be derived from the Gaussian log- 
likelihood by an approximation of the inverse of the covariance matrix. Let X = {Xi^t-, ■ ■ ■ Xt,t)' , 
H = (/^(y), • • ■ , //(^))', and St(^) B) and Ut{4>) he TxT matrices with (r, s)-entry 

St(^, B)r,s = ^J exp (iA(r - s)) A,t(A) B,,T(-A)dA (91) 
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and 



/TT 
exp (iA(r — s)) (j) 
-TT 



r + s' 



,X]dX 



(92) 



(r, s = 1,. . .T) where the functions Ar^rW, Br.T{X), (f){u,X) fulfill certain regularity con- 
ditions (^Ar^xiX) Br^xW are transfer functions or their derivatives as defined in (77)). 
[x]* = [x] denotes the largest integer less or equal to x (we have added the * to discriminate 
the notation from brackets). Direct calculation shows that 
T 



-GW 



iv) 



1 1 

4^r 



E r l°g [4vrV.(^, A)] dX + -^{X- /x^)' 
t=i -'-^ 



UT(.f;;'){X-n). (93) 



Furthermore, the exact Gaussian likelihood is 

C^irj) := 
where = Sj'(A^,^^). 



I log(2vr) + ^ log det + ^(X - /x^)'S^ ^(X - /x, 



(94) 



Proposition 5.2 below states that Ut{-^ f^^) is an approximation of S^^. Together with 
the generalization of the Szego identity in Proposition 5.3 this implies that Cj,^ is an ap- 
proximation of (see Theorem 5.4). If the model is stationary, then A.^ is constant in time 
and = Tit{A^, Ari) is the Toeplitz matrix of the spectral density /^(A) = 2^|^r?P while 
Ut{^^ f^^) is the Toeplitz matrix of j^/^^. This is the classical matrix- approximation 
leading to the Whittle likelihood (cf. Dzhaparidze, 1986). 



Proposition 5.2 Under suitable regularity conditions we have for each e > for the Eu- 
clidean norm 

^ W^t{A,A)-' - Ut{{2t^AA']-^)\\1 = 0{T-^+') (95) 

and 



1 



\Ut{<P)-^ - C/t({4^V}"') II2 = 0{T'^+'). 



Proof. See Dahlhaus (2000), Proposition 2.4. 

By using the above approximation it is possible to prove the following generalization of the 
Szego identity (cf. Grenander and Szego (1958), Section 5.2) to locally stationary processes. 

Proposition 5.3 Under suitable regularity conditions we have with f{u,X) = i^\A{u, X)]"^ 
for each e > 



^ log det St ( A ^) = ^ 



J-TT 



log [27r/('u,A)] dXdu + OiT-^+'). 



If A = Aff depends on a parameter rj then the 0(T -'^+^) term is uniform in rj. 
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Proof. See Dahlhaus (2000), Proposition 2.5. 

In certain situations the right hand side can be written in the form jj^ log (27ro"^(u)) dn 
where cr'^iu) is the one step prediction error at time u. 

The mathematical core of the above results consists of the derivation of properties of products 
of matrices Tjt{A, B), T^xiA, A)~^ and Ut{4')- These properties are derived in Dahlhaus 
(2000) in Lemma A.l, A. 5, A. 7 and A. 8. These results are generalizations of corresponding 
results in the stationary case proved by several authors before. 

We now state the properties of the different likelihoods. 

Theorem 5.4 Under suitable regularity conditions we have for A; = 0, 1, 2 



sup |V'={£^^(7?)-£f(77)}|4 0, 

(ii) 

sup|V*^{/:^^(r?)-£(?7)}|4 0, 



(Hi) 



sup |v'={£f(??)-£(r?)}|4 0. 



Proof. See Dahlhaus (2000), Theorem 3.1. 

Under stronger assumptions one may also conclude that fjj,^— fj^^ = Op{T^^^'^) which 
means that fjj:^ is a close approximation of the MLE. A sketch of the proof is given in 
Dahlhaus (2000), Remark 3.4. 

Remark 5.5 It is interesting to compare the generalized Whittle estimate fj^^ and its un- 
derlying approximation Ut{-^ f^^) of with the block Whittle estimate fjJp^ defined in 



(21). There some overlapping block Toeplitz matrices are used as an approximation which 



we regard as worse. A similar result as in Proposition 5.2 has been proved in Lemma 4.7 
of Dahlhaus (1996a) for this approximation. We conjecture that also a similar result as in 
with Clp'^ {rj) can be proved and even more that fj^^— fj^^ = Op( yj^e + jj) 



Theorem 



5.4 



(this is more a vague guess than a solid conjecture) which means that the latter approxima- 
tion and presumably also the estimate fj^^ are worse. It would be interesting to have more 
rigorous results and a careful simulation study with a comparison of both estimates. □ 
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We now remember the generahzed Whittle likelihood from (89) which was 



(1) 



E 

t=i 



1 

47r 



log47r^/^(-,A 



^ Jr(r, A) 



Contrary to the true Gaussian likelihood this is a sum over time and the summands can be 
interpreted as a local log likelihood at time point t. We therefore define 



2f t\\ , Jt{t<, A) 
/0(A) 



log4^V0(A) + 



(96) 



(to avoid confusion we mention that we use the notation r] for a finite dimensional parameter 
which determines the whole curve, that is 6{-) = 0^{-) and fri{u,X) = fe^{u)i^))- We now 



can construct all nonparametric estimates (26)-(30) with £t,T{d) replaced by l^j{9^ leading 
in each of the 5 cases to an alternative local quasi likelihood estimate. 



The parametric estimator (30 ) with this local likelihood is the estimate f/^^ from above. The 



orthogonal series estimator ( 28 ) with i^rp(^d^ has been investigated for a truncated wavelet 



series expansion together with nonlinear thresholding in Dahlhaus and Neumann (2001). 
The method is fully automatic and adapts to different smoothness classes. It is shown that 
the usual rates of convergence in Besov classes are attained up to a logarithmic factor. The 



nonparametric estimator (29) with i^j,{6^ is studied in Dahlhaus and Polonik (2006). Rates 



of convergence, depending on the metric entropy of the function space, are derived. This 
includes in particular maximum likelihood estimates derived under shape restriction. The 
main tool for deriving these results is the so called empirical spectral processes discussed in 



the next section. The kernel estimator (26) with been investigated in Dahlhaus 



(2009), Example 3.6. Uniform convergence has been proved in Dahlhaus and Polonik (2009), 



Section 4 (see also Example 6.6 and Theorem 6.9 below). The local polynomial fit (27) has 



not been investigated yet in combination with this likelihood. 

The whole topic needs a more careful investigation - both theoretically and from a practical 
point including simulations and data-examples. 



6 Empirical spectral processes 

We now emphasize the relevance of the empirical spectral process for linear locally stationary 
time series. The theory of empirical processes does not only play a major role in proving 
theoretical results for statistical methods but also provides a deeper understanding of many 
techniques and the arising problems. The theory was first developed for stationary processes 
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(c.f. Dahlhaus (1988), Mikosch and Norvaisa (1997), Fay and Soulier (2001)) and then 
extended to locally stationary processes in Dahlhaus and Polonik (2006,2009) and Dahlhaus 
(2009). The empirical spectral process is indexed by classes of functions. Basic results 
that later lead to several statistical applications are a functional central limit theorem, a 
maximal exponential inequality and a Glivenko-Cantelli type convergence result. All results 
use conditions based on the metric entropy of the index class. Many results stated earlier 
in this article have been proved by using these techniques. 

The empirical spectral process is defined by 



where 

lO J -IT 

is the generalized spectral measure and 



Et(</') :=VT{Ft{(I))-F{^ 

u, A) /(n, A) dX du 



t=i •'-'^ 



'L^x)jrC-x)d\ 



(97) 



(98) 



the empirical spectral measure with the pre-periodogram as defined in 

We first give an overview of statistics that can be written in the form Ft{4>) - several of them 
have already been discussed earlier in this article {Kt always denotes a kernel function). 



1. 
2. 
3. 
4. 
5. 

6. 
7. 
8. 
9. 



{u,X 
{u,X 
{u,X 
{u,X 
{u,X 

{u,X 
{u,X 
{u,X 
{u,X 



Kt{uq — u) cos{Xk) 



local covariance estimator to| a.s.; Remark 



6.7 



6.7 



6.6 



Rem. 



6.7 



Kt{uq — u)Kt{Xq — X) spectral density estimator (84) a.s.; Remark 

KT{uQ-u)Vfe,{u,X)-^ VC^^{uo,eo),eo = eo{uo) Example 

KT{uQ — u)'Vfg^^{u,X)^^ local least squares Ex. 

V/^(,(u,A)~^ param. Whittle estimator Example 3.7 in 

Dahlhaus and Polonik (2009) 

(/[o^up] (tt) — Uo)/[o,Ao] (^) testing stationarity Example 

cos(AA;) stationary covariance Remark 

^ fr}o{X)^^ stat. Whittle estimator Remark 

KxiXo — X) stationary spectral density Remark 



6.10 



6.2 



6.2 



6.2 



Examples 1-4 and 9 are examples with index functions (px depending on T. More com- 
plex examples are nonparametric maximum likelihood estimation under shape restrictions 
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(Dahlhaus and Polonik, 2006), model selection with a sieve estimator (Van Bellegem and 
Dahlhaus, 2006) and wavelet estimates (Dahlhaus and Neumann, 2001). Moreover Ft{(P) 
occurs with local polynomial fits (Kim, 2001; Jentsch, 2006) and several statistics suitable 
for goodness of fit testing. These applications are quite involved. 

However, applications are limited to quadratic statistics, that is the empirical spectral mea- 
sure is usually of no help in dealing with nonlinear models. Furthermore, for linear pro- 
cesses the empirical process only applies without further modification to the (score function 
and the Hessian of the) likelihood Cj,^{r]) and its local variant {u,6) and the local 
Whittle likelihood C^{u,0). It also applies to the exact likelihood C^{r]) after proving 



'\/Cj,^{rjo) — \/£^{r]o) = Op(T~-^/^) (see also Theorem 5.4 (i)) and the conditional likeli- 
hoods Cj'i'T]) and jCj>{u, 0) in the tvAR-case (see Remark 6.7 - in the general case this is not 
clear yet). For the block Whittle likelihood {■>]) it may also be applied after establishing 
V£^^(?7o) ~ V£^^(r/o) = Op(T~^/^). However, this is also not clear yet. 

We first state a central limit theorem for Et{4>) with index functions (j) that do not vary with 
T. We use the assumption of bounded variation in both components of (piu, A). Besides the 



definition in (61) we need a definition in two dimensions. Let 



V'^{(j)) = sup I ^ \(j){uj,Xk) - 4>{uj-i, Afc) - 0(nj, Afe-i) + 4>{uj-i, \k-i)\ ■ 
' 1 

< UQ < . . . < ue < 1- -vr < Ao < . . . < Am < vr; £,m £ n| . 



j,k=i 



For simplicity we set 



y := supV{(j){u,-)), \\4>\\v,oo ■■= sup !/((/>(•, A)), 

u A 
V,v ■=V'^{(f)) and ||(/)||oo,oo := sup|(/)(u, A)|. 

u,X 



Theorem 6.1 Suppose Assumption 4-1 holds and let (pi, . . . ,(f)d be functions with \\(j)j\\oo,v , 
\\4'j\\v,oo, ll^^jllv.y o,nd ||0j||oo,oo being finite {j = l,...,d). Then 

where (£'(</)j))^._^ d ^ Gaussian random vector with mean and 
cov{E{(Pj),E{(l)k)) =2tt / (j)j{u,X)[cl)k{u,X) + cl)k{u,-X)]f{u,X)dXdu 

Jo J-1T 

(99) 

+ K4 (j)j{u,Xi)f{u,Xi)dXi^(^ (pk{u,X2)fiu,X2)dX2^du. 
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Proof. See Dahlhaus and Polonik (2009), Theorem 2.5. 



Remark 6.2 (Stationary processes/Model misspecification by stationary models) 

The classical central limit theorem for the weighted periodogram in the stationary case can 
be obtained as a corollary: If (/>(«, A) = (piX) is time-invariant then 

T 

"/ — TV ^ "/ — TT 



(see(87)) that is FT{(f>) is the classical spectral measure in the stationary case with the 
following applications: 

(i) 0(n, A) = i;^(A) = cos Xk is the empirical covariance estimator of lag k; 

(ii) (f){u,X) = 4>{X) = fe^{X) is the score function of the Whittle likelihood. 



Theorem |6.1| gives the asymptotic distribution for these examples - both in the stationary 
case and in the misspecified case where the true underlying process is only locally stationary. 
If </)(u, A) = 0(A) is a kernel we obtain an estimate of the spectral density whose asymptotic 
distribution is a special case of Theorem 6.3 below (also in the misspecified case). □ 

We now state a central limit theorem for FT{<pT)—F{(t>T) with index functions (j)T depending 
on T. In addition we extend the hitherto definitions to tapered data 

where hx ■ (0,1] — )■ [0, oo) is a data taper (with /it(") = -^{0,1] (") being the nontapered 
case). The main reason for introducing data-tapers is to include segment estimates - see the 
discussion below. As before the empirical spectral measure is defined by 

T 

Ft{4>) = F^'-\<t>) := l-Y, r cpd,X)jt\^,>^)dX (101) 

t=i ■'-■^ 

now with the tapered pre-periodogram 

'^t^Xt'^^^ji ^ ^[tA/2+/fc/2],T^[m/2-fe/2],r^^P(~^-^^) (102) 

^ ^ k:l<[t+l/2±k/2]<T 

(we mention that in some cases a rescaling may be necessary for J^'^^^ (n. A) to become a 
pre-estimate of f{u,X) - an obvious example for this is hxiu) = (1/2) /(o,i](^))- 

F{(j)) is the theoretical counterpart of FT{(j)) 

F(0) = F('^^)(0) := / hj^iu) [ (l){u,X)f{u,X)dXdu. (103) 
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Note that (87) also holds with a data-taper, that is 



T 

with the tapered periodogram 

T 



t=l 



r(hT) 



(A) := 



1 



2ttH. 



2,r 



^) exp(-iAs) where H2,t ■= 



t \2 



(104) 



s=l 



t=l 



An important special case is h^°\jp) := ^( ""^^^^ ) with bandwidth It and k having compact 
support on [—\, 5]- If bx '■= N/T then I^^\x) = /^(no, A) with It{uo,\) as in (19). If in 
addition (p{u,X) = ip{X) we obtain 



UQ-t/T^ 



T 



2,T 



V'(A)4''^^(A)(iA. 



For example for ip{X) := expiXk this is exactly ^^k^ CT{uo,k) with the tapered covariance 
estimate from m\. In this proportional to It- 



The last example suggests to use instead of ^ in ( 101 ) as a norming constant. However, 
this is not always the right choice (as can be seen from case (ii) in Remark 6.5). 

It turns out that in the above situation the rate of converge of the empirical spectral measure 
becomes \/T / p''^^\(j)T) where 



h^{u) / (f){u, X)"^ dX du 



1/2 



Therefore we can embed this case into the situation treated in the last section by studying 
the convergence of 



Furthermore, let 

c^E^\4>j,<t>k) ■■=27T ChUu) r 0j(n,A) [Mu,X) + Mu,-^)] f{u,X)dXdu (105) 

Jo J-TT 

+ Ki j h^{u)(^ (l)j{u,Xi)f{u,Xi)dXi^(^ (t)k{u,X2)f{u,X2)dX'^du. 
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Theorem 6.3 Suppose that Xt^T is a locally stationary process and suitable regularity con- 
ditions hold. If the limit 



i4>Tj,4>Tk) 



T-s>oo 



exists for all j,k = 1, . . . ,d then 



FTi<PTj) - i^(^^)(0Ti) 4 AA(0, S). (107) 



Remark 6.4 (Bias) In addition we have the bias term 

P2 Wt) 

The magnitude of this bias depends on the smoothness of the time varying spectral density. 
In this section we usually require conditions such that this bias is of lower order. This is 
different in Section |3] where the bias has explicitly been investigated. □ 

Remark 6.5 (Typical applications) A typical application of this result is the case of 
kernel type local estimators which can be constructed by using kernels, data-tapers or a 
combination of both: 

(i) Mu,X) = i^K{^) ^{\) hri-) = /(o,i](-) 

(ii) ,/.T(n, A) = j^K{^) ^PiX) hriu) = I[uo~b,/2,uo+br/2]{u) 
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where K(-) and k{-) are kernel functions and Bt is the bandwidth. If K{-) = k{ )'^ then 
the resulting estimates all have the same asymptotic properties - see below. Dependent on 
the function this leads to different applications: If we set = cos{\k) the estimate 
(iii) is the estimate CT{uo,k) from ([8| and (i) is "almost" the estimate CT{uo,k) from ([9| 
(for k even it is exactly the same, for k odd the difference can be treated with the methods 



mentioned in Remark 5.5) 



We now show how Theorem 6.3 leads to the asymptotic distribution for these estimates: 



(i) If K{-) and are of bounded variation and bx — ^ 0, bxT — )• oo then the regularity 



conditions of Theorem 5.5 are fulfilled (see Dahlhaus (2009), Remark 3.4). Furthermore, 



P^^'^\^t) = P2{M = (^/ K\x)dx J mX)\'dXy^' « b^'^\ (108) 
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For /(•, A) continuous at uq we have 



1 



+ 



V'j(Ai)/(no, Ai) dA] 



that is (106) is also fulfilled and we obtain from Theorem 6.3 



j=i,...,d 



AA(o,r). 



(109) 



(ii) The additional taper hxiu) = I[uo-bT l2,UQ+bT /2]i^) implies that we use only data from 
the interval [uq — 6^/2,^0 + 6^/2]- We obtain in this case 



i.e. we have the same p'^^\(pT) as above. Furthermore, c'^^\(j)T , (^t) is the same. Thus we 
obtain the same asymptotic distribution and the same rate of convergence. 

(iii) If K{-) = k{-)'^ we obtain in this case 



1 Sf^T), 



i.e. we obtain again the same expression. Furthermore, jp- {4>Tj , 4>Tk) is the same as 



^^E'^\4'Tj-,<t>Tk) above. Thus we have again the same asymptotic distribution and the same 
rate of convergence. □ 



Example 6.6 (Curve estimation by local quasi likelihood estimates) 



Local Whittle estimates on a segment where defined in (17) and discussed in Example 3.6 



(the bias was heuristically derived in Example 3.5 ). We now consider the presumably equiv- 
alent local quasi likelihood estimate defined by 



-GW, 



Orp (uo) := argmiuvC^- {uq,0) 
Bee 



(110) 



with 



t=i ^ 



bx 



2f (\\ , Jt{t^, A) ■ 



log4^2^0(A) + 



d\. (Ill) 
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I I I I ~ GW 

(this is a combination of (26) and (96)). The asymptotic normahty of the estimate 6rp (uq) 

is derived in Dahlhaus (2009), Example 3.6. Key steps in the proof are the fact that both 
the score function and the Hessian matrix can be written in terms of the empirical spectral 
process leading to a rather simple proof. For example 

^/b^ V,Ct{uo, Ooiuo)) = \/6^t(^t(0t,«o,*) - F{^T,uo,i)) + Op{l) (112) 



-^^(^^) 4^Vi/0 ^(A)|0=0q(„„). Theorem 6.3 then immediately gives 



where (j)T,uo,ii'", A) : = 

the asymptotic normality of the score function and after some additional considerations also 
asymptotic normality of Orp (uq). For details see Dahlhaus (2009), Example 3.6. 



The above estimate corresponds to case (i) in Remark 6.5 Case (iii) in Remark 6.5 leads 
instead to the tapered Whittle estimate Oj^ (uq) on the segment, since for 4""^(t) ■= 
A: ( "° j^^^"^ ) we have I^'^\x) = It{uo,X) with It{uq,X) as in (19). This estimate has the 
same asymptotic properties provided k{-)'^ = K{-). It's asymptotic properties can now also 
be derived by using Theorem 6.3 □ 



Remark 6.7 (Related estimates) Many estimates are only approximately of the form 
discussed above - for example the sum statistic 



2tt 

r2 



EE' 

t=i j=i 



X 



[hr) 



(113) 



where A,- = ^ - or representations in terms of the Fourier-coefficients. Important examples 



of related estimates are the spectral density estimate ( 84 ) , the covariance estimates K)h and 



(10) and the score function of the local least squares tvAR(p)-estimate from Example 3.1 



We mention that the central limit theorem in Theorem 16.31 also holds for several modified 
estimators. Details and proofs can be found in Dahlhaus (2009), Section 4. □ 



We now briefly mention the exponential inequality. Since this is a non-asymptotic result it 
holds regardless whether (j) depends on T. Let P2,t{4') •= (f? Yln=i /^tt "^(t' '^)^^'^)^^^' 

Theorem 6.8 (Exponential inequality) Under suitable regularity conditions we have for 
all T] > 

p(^\Vf{FT{^)-BFTi^)) 



> r]j < ci exp 
with some constants c\, C2 > independent ofT. 



C2 



P2.t{4') 



(114) 



This result is proved in Dahlhaus and Polonik (2009), Theorem 2.7. There exist several 
versions of this result - for example in the Gaussian case it is possible to omit the ^/~■ in 
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(114) or to prove a Bernstein- type inequality which is even stronger (cf. Dahlhaus and 
Polonik, 2006, Theorem 4.1). 

Subsequently, a maximal inequality, i.e. an exponential inequality for sup^g^ \Et{4>)\ has 
been proved in Dahlhaus and Polonik (2009), Theorem 2.9 under conditions on the metric 
entropy of the corresponding function class <1>. We refer to that paper for details. 

With the maximal inequality tightness of the empirical spectral process can be proved leading 
to a functional central limit theorem for the empirical spectral process indexed by a function 
class (cf. Dahlhaus and Polonik (2009), Theorem 2.11). Furthermore a Glivenko Cantelli 
type result for the empirical spectral process can be obtained (Theorem 2.12). 

Other applications of the maximal inequality are for example uniform rates of convergence 
for different estimates. As an example we no w st ate a uniform convergence result for the 



local quasi likelihood estimate 6rp [uq) from (110) 



Theorem 6.9 Let Xt^T be a locally stationary process with /x(-) = 0. Under suitable regu- 
larity conditions (in particular under the assumption that /^(A) is twice differentiable in 6 
with uniformly Lipschitz continuous derivatives in X) we have for bxT » (logT)^ 

sup \\6t^ (uo) - eo{u)\\ = Op( J_ + br). 

that is for ~ T^^l^ we obtain the uniform rate Op{T~'^/^) . 

Proof. The result has been proved in Dahlhaus and Polonik (2009), Theorem 4.1. 

Example 6.10 (Testing for stationarity) Another application of the maximal inequal- 
ity is the derivation of a functional central limit for the empirical spectral process. A possible 
application is a test for stationarity. We briefly present the idea - although we clearly men- 
tion that the construction below is finally not successful. The idea for a test of stationarity 
is to test whether the time varying spectral density /(u. A) is constant in u. This is for 
example achieved by the test statistic 



T sup sup 

mG[0,1] A6[0,7r] 



Ft{u,X) -uFt{1,\) 



(115) 



where 



t=i 
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is an estimate of the integrated time frequency spectral density F{u, A) := Jq f{v, n) d^dv, 



and 



uFt{1,X)=u / lT{fJ-)dfi 
Jo 



is the corresponding estimate of F{u, A) under the hypothesis of stationarity where f{v, fi) 
f{lj). Under the hypothesis of stationarity we have 



F{u,\)-uF{l,\)= f I {l[o^^^{v)-u)fif,)dtidv 
Jo Jo 







and therefore 

Vt(^Ft{u,X)-uFt{1,X)) =Et{(Pu,x) 

with cl)u,\{v,fi) = {l[o,u]{'^) ~ u) /[o^A](/f^)- We now need functional convergence of ET{(f>u,x)- 
Convergence of the finite dimensional distributions follows from Theorem 6.1 above. Tight- 
ness and therefore the functional convergence follows from Theorem 2.11 of Dahlhaus and 
Polonik (2009). As a consequence we obtain under the null hypothesis 

Vf (Ft(u,X) -uFt(1,X)] ^E(u,X) ^rnii.^rn 1 

where E(^u, A) is a Gaussian process. If K4 = (Gaussian case) and /(/i) = c it can be shown 
that this is the Kiefer-Miiller process. However, for general / it is a difficult and unsolved 
task to calculate or estimate the limit distribution and in particular the distribution of 
the test statistic in (115). This may be done by transformations (like Up - or Tp - type 
transforms) and/or by finding an adequate bootstrap method. 

We mention that Paparoditis (2009, 2010) has given two different solutions of this testing 
problem. □ 



7 Additional topics and further references 

This section gives an overview over additional topics with further references. We concentrate 
on work which uses the infill asymptotic approach of local stationarity. Even in this case it 
is not possible to give a complete overview. 

1. Locally stationary wavelet processes: There exists a large number of papers on the use 
of wavelets for modeling locally stationary processes. The first type of application is to 
estimate the parameter curves via the use of wavelets. This has been mentioned a few times 



in the above presentation (cf. (28)). 
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A breakthrough for the apphcation of wavelets to nonstationary processes was the introduc- 
tion of "locally stationary wavelet processes" by Nason et.al. (2000). This class is somehow 



the counterpart to the representation (60) for locally stationary processes. It also uses a 
rescaling argument - thus making all methods for these processes accessible to a meaning- 
ful asymptotic theory. Locally stationary wavelet processes are processes with the wavelet 
representation 

^ oo oo 

Xt^T = Wj^k;Ti^j,k-tCj,k (116) 

j=l k=—oo 

where {£,j,k} are a collection of uncorrelated random variables with mean and variance 

1, the {ipj.t} are a set of discrete nondecimated wavelets (compactly supported oscillatory 
vectors with support proportional to 2^), and {'Wj^k;T} are a collection of amplitudes that are 
smooth in a particular way as a function of k. The smoothness of {wj^k;T} controls the degree 
of local stationarity of Xt^x- The spectrum is linked to the process by {wj^k-^T} ~ ^j{T)' 
Nason et.al. (2000) also define the "evolutionary wavelet spectrum" and show how this can 
be estimated by a smoothed wavelet periodogram. In addition this leads to an estimate of 
the local covariance. An introduction to LSW-processes and an overview on early results for 
such processes can be found in Nason and von Sachs (1999). Fryzlewicz and Nason (2006) 
suggest the use of a Haar-Fisz method for the estimation of evolutionary wavelet spectra 
by combining Haar wavelets and the variance stabilizing Fisz transform. Van Bellegem and 
von Sachs (2008) consider wavelet processes whose spectral density function changes very 
quickly in time. By using a wavelet-type transform of the autocovariance function with 
respect to so-called autocorrelation wavelets they propose a pointwise adaptive estimator of 
the time varying autocovariance and the time varying spectrum. 

Furthermore, several papers mentioned below use the framework of LSW-processes. 

2. Multivariate locally stationary processes: We first mention that in particular the Gaus- 
sian likelihood theory for locally stationary processes from Section [5] also holds for multi- 
variate processes - see Dahlhaus (2000). 

Beyond that Chiann and Morettin (1999, 2005) investigate the estimation of time varying 
coefficients of a linear system where the input and output are locally stationary processes. 
They study different estimation techniques in the frequency- and time domain. 

Ombao et al. (2001) analyze bivariate nonstationary time series. They use SLEX functions 
(time-localized generalization of the Fourier waveform) and propose a method that automat- 
ically segments the time series into approximately stationary blocks and selects the span to 
be used to obtain the smoothed estimates of the time varying spectra and coherence. Om- 
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bao et.al. (2005) use the SLEX framework to build a family of multivariate models that can 
explicitly characterize the time varying spectral and coherence properties of a multivariate 
time series. Ombao and Van Bellegem (2008) estimate the time varying coherence by using 
time-localized linear filtering. Their method automatically selects via tests of homogeneity 
the optimal window width for estimating local coherence. 

Motta et.al. (2011) propose a locally stationary factor model for large cross-section and 
time dimensions. Factor loadings are estimated by the eigenvectors of a nonparametrically 
estimated covariance matrix. Eichler et. al. (2011) investigate dynamic factor modeling of 
locally stationary processes. They estimate the common components of the dynamic factor 
model by the eigenvectors of an estimator of the time varying spectral density matrix. This 
can also be seen as a time varying principal components approach in the frequency domain. 

Cardinal! and Nason (2010) introduce the concept of costationary of two locally stationary 
time series where some linear combination of the two processes is stationary. They show 
that costationarity imply a error-correction type of formula in which changes in the variance 
of one series are reflected by simultaneous balancing changes in the other. Sanderson et.al. 
(2010) propose a new method of measuring the dependence between non-stationary time 
series based on a wavelet coherence between two LSW-processes. 

3. Testing of locally stationary processes - in particular tests for stationarity: Among the 
large literature on testing there is a considerable part devoted to testing of stationarity. 
Tests of stationarity have already been proposed and theoretically investigated before the 
framework of local stationarity was created. In that cases the theoretical investigations 
mainly consisted in the investigation of the asymptotic distribution of the test statistics 
under the hypothesis of stationarity. 

Priestley and Subba Rao (1969) proposed testing the homogeneity of a set of evolutionary 
spectra evaluated at different points of time. For Gaussian processes and for the purpose 
of a change point detection Picard (1985) developed a test based on the difference between 
spectral distribution functions estimated on different parts of the series and evaluated using 
a supremum type statistic. Giraitis and Leipus (1992) generalized this approach to the 
case of linear processes. Von Sachs and Neumann (2000) developed a test of stationarity 
based on empirical wavelet coefficients estimated using localized versions of the periodogram. 
Paparoditis (2009) developed a nonparametric test for stationarity against the alternative 
of a smoothly time varying spectral structure based on a local spectral density estimate. 
He also investigated the power under the fixed alternative of a locally stationary processes. 
Paparoditis (2010) tested the assumption of stationarity by evaluating the supremum over 
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time of an L2-distance between the local periodogram over a rolling segment and an estimator 
of the spectral density obtained using the entire time series at hand. The critical values of 
a supremum type test are obtained using a stationary bootstrap procedure. Dwivedi and 
Subba Rao (2011) construct a Portmanteau type test statistic for testing stationarity of a 
time series by using the property that the discrete Fourier transforms of a time series at 
the canonical frequencies are asymptotically uncorrelated if and only if the time series is 
second-order stationary. 

Tests of general hypothesis are derived in Sakiyama and Taniguchi (2003) who test para- 
metric composite hypothesis by the Gaussian likelihood ratio test, the Wald test and the 
Lagrange multiplier test. Sergides and Paparoditis (2009) develop tests of the hypothesis 
that the time varying spectral density has a scmiparametric structure. The test introduced 
is based on a L2-distance measure in the spectral domain. As a special case they test for 
the presence of a tvAR model. A bootstrap procedure is applied to approximate more ac- 
curately the distribution of the test statistic under the null hypothesis. Preufi et. al. (2011) 
also test scmiparametric hypotheses. Their method is based on an empirical version of the 
L2-distance between the true time varying spectral density and its best approximation under 
the null hypothesis. 

Zhou and Wu (2010) construct simultaneous confidence tubes for time varying regression 
coefficients in functional linear models. Using a Gaussian approximation result for non- 
stationary multiple time series, they show that the constructed simultaneous confidence 
tubes have asymptotically correct nominal coverage probabilities. 

4. Bootstrap methods for locally stationary processes: Bootstrap methods are in particular 
needed to derive the asymptotic distribution of test statistics. A time domain local block 
bootstrap procedure for locally stationary processes has been proposed by Paparoditis and 
Politis (2002) and by Dowla et al. (2003). Sergides and Paparoditis (2008) develop a method 
to bootstrap the local periodogram. Their method generates pseudo local periodogram or- 
dinates by combining a parametric time and nonparametric frequency domain bootstrap 
approach. They first fit locally a time varying autoregressive model to capture the essential 
characteristics of the underlying process. A locally calculated non-parametric correction in 
the frequency domain is then used so as to improve upon the locally parametric autore- 
gressive fit. Kreiss and Paparoditis (2011) propose a nonparametric bootstrap method by 
generating pseudo time series which mimic the local second and fourth order moment struc- 
ture of the underlying process. They prove a bootstrap central limit theorem for a general 
class of preperiodogram based statistics. 
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5. Model misspecification and model selection: Model selection criteria have been heuristi- 
cally suggested many times for time varying processes - c.f. Ozaki and Tong (1975); Kita- 
gawa and Akaike (1978) and Dahlhaus (1996b, 1997) among others - in all papers AlC-type 
criteria have been suggested for different purposes. 

Van Bellegem and Dahlhaus (2006) consider semiparametric estimation and estimate the 
Kullback-Leibler distance between the semiparametric model and the true process. They 
use this estimate then as a model selection criterion. Hirukawa et.al. (2008) propose a 
generalized information criterion based on nonlinear functionals of the time varying spectral 
density. Chandler (2010) investigates how time varying parameters affect order selection. 

Another interesting aspect is that many results of this paper also hold under model - mis- 



specification - for example Theorem 5.1 and the corresponding result for the Block Whittle 
estimate from (20). An important example is the case where a stationary model is fitted and 
the underlying process in truth is only locally stationary - see Example |4.5| and the more 
detailed discussion for stationary Yule-Walker estimates in Dahlhaus (1997), Section 5. 

6. Likelihood theory and large deviations: Local asymptotic normality (LAN) is derived in 
the parametric Gaussian case in Dahlhaus (1996b) and Dahlhaus (2000) (cf. Remark 3.3 
in that paper). A nonparametric LAN-result is derived in Sakiyama and Taniguchi (2003) 
and a LAN result under non-Gaussianity in Hirukawa and Taniguchi (2006). In both papers 
the results are applied to asymptotically optimal estimation and testing. For some statistics 
also the asymptotic distribution under contiguous alternatives is derived. Tamaki (2009) 
studies second order asymptotic efficiency of appropriately modified maximum likelihood 
estimators for Gaussian locally stationary processes. 

Large deviations principles for quadratic forms of locally stationary processes are derived in 
Zani (2002) including applications to local spectral density and covariance estimation. Wu 
and Zhou (2011) obtain an invariance principle for non-stationary vector- valued stochastic 
processes. They show that the partial sums of non-stationary processes can be approximated 
on a richer probability space by sums of independent Gaussian random vectors. 

7. Recursive estimation: Recursive estimation algorithms are of the form 

9t = et-i + XtHXt,et-i) (117) 

where Xt = {Xi, . . . , Xt)' . The recursive structure yields an update of the estimate as 
soon as the next observation becomes available and the estimate therefore is particularly 
of importance in an online situation. For stationary processes the algorithm is used with 
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Aj ~ 1/t while in nonstationary situations one uses a nondecreasing A (constant stepsize 
case) that is the estimate puts stronger weights on the last observations. 

Adaptive estimates of the above type have been investigated over the last 30 years in dif- 
ferent scientific communities: by system theorists under the name "recursive identification 
methods" (cf. Ljung (1977); Ljung and Soderstrom (1983)), in the stochastic approxima- 
tion community (cf. Benveniste, Metivier and Priouret (1990); Kushner and Yin (1997)), 
in the neural network community under the name "back-propagation algorithm" (cf. White 
(1992) or by Haykin (1994)), and in applied sciences, particularly for biological and medical 
applications (cf. Schack and Grieszbach (1994)). 

The properties of recursive estimation algorithms have rigorously been investigated in many 
papers under the premise that the underlying true process is stationary. However, for 
nonstationary processes and the constant stepsize case there did not exist for a long time 
a reasonable framework to study theoretically the properties of these algorithms. This has 
changed with the concept of locally stationary processes with it's infill asymptotics which 
now allows for theoretical investigations of these algorithms. 

In Moulines et.al. (2005) the properties of recursive estimates of tvAR-processes have been 
investigated in the framework of locally stationary processes. The asymptotic properties of 
the estimator have been proved including a minimax result. In Dahlhaus and Subba Rao 
(2007) a recursive algorithm for estimating the parameters of a tvARCH-process has been 
proposed. Again the asymptotic properties of the estimator have been proved. 

8. Inference for the mean curve: Modeling the time varying mean of a locally stationary 
process is an important task which has not been discussed in this overview. In principle 
nearly all known techniques from nonparametric regression may be used such as kernel 
estimates, local polynomial fits, wavelet estimates or others. The situation is however much 
more challenging since the "residuals" are in this case a locally stationary process which 
usually is modeled at the same time. 

In general the topic needs more investigation. Dahlhaus (1996a, 1996b, 1997, 2000) and 
Dahlhaus and Neumann (2001) contain also results where the mean is time varying and/or 
estimated. A more detailed investigation is contained in Tunyavetchakit (2010) in the con- 
text of time varying AR(p)-processes where the mean curve is estimated in parallel and the 



optimal segment length is determined similar to (16). 



9. Piecewise constant models: Davis et.al. (2005) consider the problem of modeling a class of 
nonstationary time series using piecewise constant AR-processes. The number and locations 
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of the piecewise AR segments, as well as the orders of the respective AR processes, are 
determined by the minimum description length principle. The best combination is then 
determined by a genetic algorithm. In Davis et.al. (2008) to general parametric time series 
models for the segments and illustrate the method with piecewise GARCH-models, stochastic 
volatility and generalized state space models. 

Locally constant parametric models have also been considered in a non-asymptotic approach 
by Mercurio and Spokoiny (2004) and others where the so-called small modeling bias con- 
dition is used to determine the length of the interval of time homogeneity and to fit the 
parameters - for more details see also Spokoiny (2010). 

10. Long memory processes: Beran (2009) and Palma and Olea (2010) have extended the 
concept of local stationarity to long-range dependent processes. While Beran (2009) uses a 



nonparametric approach with a local least squares estimate similar to (26) Palma and Olea 
(2010) use a parametric approach and use the block Whittle likelihood from (21). Both 
papers then investigate the asymptotic properties. Roueff and von Sachs (2011) use a local 
log-regression wavelet estimator of the time-dependent long memory parameter and study 
it's asymptotic properties. 

11. Locally stationary random fields: Fuentes (2001) studies different methods for locally 
stationary isotropic random fields with parameters varying across space. In particular she 
uses local Whittle estimates. Eckley et.al. (2010) propose the modeling and analysis of 
image texture by using an extension of a locally stationary wavelet process model for lat- 
tice processes. They construct estimates of a spatially localized spectrum and a localized 
autocovariance which are then used to characterize textures in a multiscale and spatially 
adaptive way. Anderes and Stein (2011) develop a weighted local likelihood estimate for the 
parameters that govern the local spatial dependency of a locally stationary random field. 

12. Discrimination Analysis: Discrimination Analysis for locally stationary processes based 
on the Kullback-Leibler divergence as a classification criterion has been investigated in 
Sakiyama and Taniguchi (2004) and for multivariate processes in Hirukawa (2004). Huang 
et.al. (2004) propose a discriminant scheme based on the SLEX-library and a discriminant 
criterion that is also related to the Kullback-Leibler divergence. Chandler and Polonik 
(2006) develop methods for the discrimination of locally stationary processes based on the 
shape of different features. In particular they use shape measures of the variance function as 
a criterion for discrimination and apply their method to the discrimination of earthquakes 
and explosions. Fryzlewicz and Ombao (2009) use a bias-corrected non-decimated wavelet 
transform for classification in the framework of LSW-processes. 
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13. Prediction: Fryzlewicz et.al. (2003) address the problem of how to forecast non-stationary 
time series by means of non-decimated wavelets. Using the class of LSW-processes they 
introduce a new predictor based on wavelets and derive the prediction equations as a gen- 
eralization of the Yule- Walker equations. Van Bellegem and von Sachs (2004) apply locally 
stationary processes to the forecasting of several economic data sets such as returns and 
exchange rates. 

14. Finance: There is a growing interest in finance for models with time varying parameters. 
An overview on locally stationary volatility models is given in Van Bellegem (2011). A 
general discussion on local stationary in different areas of finance can be found in Guegan 
(2007) - see also Taniguchi et. al. (2008). For example, many researchers are convinced that 
the observed slow decay of the sample autocorrelation function of absolute stock returns is 
not a long memory effect but due to nonstationary changes in the unconditional variance 
(c.f. Mikosch and Starica (2004), Starica and Granger (2005), Fryzlewicz et. al. (2006)) 
leading for example to GARCH-models with time varying parameters. 

References for work on tvGARCH-models have been given in Section [3] Other work on appli- 
cations of locally stationary processes in finance is for example the work on optimal portfolios 
with locally stationary returns of assets by Shiraishi and Taniguchi (2007). Hirukawa (2006) 
uses locally stationary processes for a clustering problem of stock returns. Fryzlewicz (2005) 
models some stylized facts of financial log returns by LSW-processes. Fryzlewicz et. al. 
(2006) consider a locally stationary model for financial log-returns and propose a wavelet 
thresholding algorithm for volatility estimation, in which Haar wavelets are combined with 
the variance-stabilizing Fisz transform. 

15. Further topics: Robinson (1989) uses also the infill asymptotics approach in his work 
on nonparametric regression with time varying coefficients. Orbe et al. (2000) estimate 
nonparametrically a time varying coefficients model allowing for seasonal and smoothness 
constraints. Orbe et.al. (2005) estimate the time varying coefficients under shape restric- 
tions over and for locally stationary regressors. Chiann and Morettin (2005) investigate the 
estimation of coefficient curves in time varying linear systems. 

Estimation of time varying quantile curves for nonstationary processes has been done in 
Draghicescu et.al. (2009) and Zhou and Wu (2009). Specification tests of time varying 
quantile curves have been investigated in Zhou (2010). 
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